Merge tag 'nf-24-04-11' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
authorDavid S. Miller <davem@davemloft.net>
Fri, 12 Apr 2024 12:02:13 +0000 (13:02 +0100)
committerDavid S. Miller <davem@davemloft.net>
Fri, 12 Apr 2024 12:02:13 +0000 (13:02 +0100)
netfilter pull request 24-04-11

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

Patches #1 and #2 add missing rcu read side lock when iterating over
expression and object type list which could race with module removal.

Patch #3 prevents promisc packet from visiting the bridge/input hook
 to amend a recent fix to address conntrack confirmation race
 in br_netfilter and nf_conntrack_bridge.

Patch #4 adds and uses iterate decorator type to fetch the current
 pipapo set backend datastructure view when netlink dumps the
 set elements.

Patch #5 fixes removal of duplicate elements in the pipapo set backend.

Patch #6 flowtable validates pppoe header before accessing it.

Patch #7 fixes flowtable datapath for pppoe packets, otherwise lookup
         fails and pppoe packets follow classic path.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
408 files changed:
.mailmap
Documentation/admin-guide/hw-vuln/spectre.rst
Documentation/admin-guide/kernel-parameters.txt
Documentation/devicetree/bindings/clock/keystone-gate.txt
Documentation/devicetree/bindings/clock/keystone-pll.txt
Documentation/devicetree/bindings/clock/ti/adpll.txt
Documentation/devicetree/bindings/clock/ti/apll.txt
Documentation/devicetree/bindings/clock/ti/autoidle.txt
Documentation/devicetree/bindings/clock/ti/clockdomain.txt
Documentation/devicetree/bindings/clock/ti/composite.txt
Documentation/devicetree/bindings/clock/ti/divider.txt
Documentation/devicetree/bindings/clock/ti/dpll.txt
Documentation/devicetree/bindings/clock/ti/fapll.txt
Documentation/devicetree/bindings/clock/ti/fixed-factor-clock.txt
Documentation/devicetree/bindings/clock/ti/gate.txt
Documentation/devicetree/bindings/clock/ti/interface.txt
Documentation/devicetree/bindings/clock/ti/mux.txt
Documentation/devicetree/bindings/dts-coding-style.rst
Documentation/devicetree/bindings/remoteproc/ti,davinci-rproc.txt
Documentation/devicetree/bindings/soc/fsl/fsl,layerscape-dcfg.yaml
Documentation/devicetree/bindings/soc/fsl/fsl,layerscape-scfg.yaml
Documentation/devicetree/bindings/timer/arm,arch_timer_mmio.yaml
Documentation/devicetree/bindings/ufs/qcom,ufs.yaml
Documentation/filesystems/bcachefs/index.rst [new file with mode: 0644]
Documentation/filesystems/index.rst
MAINTAINERS
Makefile
arch/arm64/kernel/ptrace.c
arch/loongarch/boot/dts/loongson-2k1000.dtsi
arch/loongarch/boot/dts/loongson-2k2000-ref.dts
arch/loongarch/boot/dts/loongson-2k2000.dtsi
arch/loongarch/include/asm/addrspace.h
arch/loongarch/include/asm/io.h
arch/loongarch/include/asm/kfence.h
arch/loongarch/include/asm/page.h
arch/loongarch/mm/mmap.c
arch/loongarch/mm/pgtable.c
arch/nios2/kernel/prom.c
arch/powerpc/include/asm/vdso/gettimeofday.h
arch/riscv/Makefile
arch/riscv/include/asm/pgtable.h
arch/riscv/include/asm/syscall_wrapper.h
arch/riscv/include/asm/uaccess.h
arch/riscv/include/uapi/asm/auxvec.h
arch/riscv/kernel/compat_vdso/Makefile
arch/riscv/kernel/patch.c
arch/riscv/kernel/process.c
arch/riscv/kernel/signal.c
arch/riscv/kernel/traps.c
arch/riscv/kernel/vdso/Makefile
arch/riscv/mm/tlbflush.c
arch/s390/include/asm/atomic.h
arch/s390/include/asm/atomic_ops.h
arch/s390/include/asm/preempt.h
arch/s390/kernel/entry.S
arch/s390/kernel/perf_pai_crypto.c
arch/s390/kernel/perf_pai_ext.c
arch/s390/mm/fault.c
arch/x86/Kconfig
arch/x86/coco/core.c
arch/x86/entry/common.c
arch/x86/entry/entry_64.S
arch/x86/entry/entry_64_compat.S
arch/x86/entry/syscall_32.c
arch/x86/entry/syscall_64.c
arch/x86/entry/syscall_x32.c
arch/x86/events/intel/ds.c
arch/x86/include/asm/coco.h
arch/x86/include/asm/cpufeature.h
arch/x86/include/asm/cpufeatures.h
arch/x86/include/asm/msr-index.h
arch/x86/include/asm/nospec-branch.h
arch/x86/include/asm/sev.h
arch/x86/include/asm/syscall.h
arch/x86/kernel/cpu/amd.c
arch/x86/kernel/cpu/bugs.c
arch/x86/kernel/cpu/common.c
arch/x86/kernel/cpu/mce/core.c
arch/x86/kernel/cpu/mtrr/generic.c
arch/x86/kernel/cpu/resctrl/internal.h
arch/x86/kernel/cpu/scattered.c
arch/x86/kernel/setup.c
arch/x86/kernel/sev.c
arch/x86/kvm/Kconfig
arch/x86/kvm/reverse_cpuid.h
arch/x86/kvm/svm/sev.c
arch/x86/kvm/vmx/vmenter.S
arch/x86/kvm/x86.c
arch/x86/lib/retpoline.S
arch/x86/mm/numa_32.c
arch/x86/mm/pat/memtype.c
arch/x86/virt/svm/sev.c
block/bdev.c
block/ioctl.c
drivers/acpi/thermal.c
drivers/ata/ahci_st.c
drivers/ata/pata_macio.c
drivers/ata/sata_gemini.c
drivers/ata/sata_mv.c
drivers/ata/sata_sx4.c
drivers/base/core.c
drivers/base/regmap/regcache-maple.c
drivers/block/null_blk/main.c
drivers/crypto/ccp/sev-dev.c
drivers/firewire/ohci.c
drivers/gpio/gpiolib-cdev.c
drivers/gpio/gpiolib.c
drivers/gpu/drm/display/drm_dp_dual_mode_helper.c
drivers/gpu/drm/drm_prime.c
drivers/gpu/drm/i915/Makefile
drivers/gpu/drm/i915/display/intel_display.c
drivers/gpu/drm/i915/display/intel_display_device.h
drivers/gpu/drm/i915/display/intel_display_types.h
drivers/gpu/drm/i915/display/intel_dp.c
drivers/gpu/drm/i915/display/intel_dp_mst.c
drivers/gpu/drm/i915/display/intel_psr.c
drivers/gpu/drm/i915/gt/gen8_ppgtt.c
drivers/gpu/drm/i915/gt/intel_engine_cs.c
drivers/gpu/drm/i915/gt/intel_gt.c
drivers/gpu/drm/i915/gt/intel_gt.h
drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c [new file with mode: 0644]
drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h [new file with mode: 0644]
drivers/gpu/drm/i915/gt/intel_gt_regs.h
drivers/gpu/drm/i915/gt/intel_workarounds.c
drivers/gpu/drm/nouveau/nouveau_uvmm.c
drivers/gpu/drm/nouveau/nvkm/engine/gr/gf100.c
drivers/gpu/drm/nouveau/nvkm/subdev/devinit/gm107.c
drivers/gpu/drm/nouveau/nvkm/subdev/devinit/r535.c
drivers/gpu/drm/panfrost/panfrost_gpu.c
drivers/gpu/drm/xe/xe_device.c
drivers/gpu/drm/xe/xe_device_types.h
drivers/gpu/drm/xe/xe_exec.c
drivers/gpu/drm/xe/xe_exec_queue_types.h
drivers/gpu/drm/xe/xe_gt_pagefault.c
drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
drivers/gpu/drm/xe/xe_gt_types.h
drivers/gpu/drm/xe/xe_preempt_fence.c
drivers/gpu/drm/xe/xe_pt.c
drivers/gpu/drm/xe/xe_ring_ops.c
drivers/gpu/drm/xe/xe_sched_job.c
drivers/gpu/drm/xe/xe_sched_job_types.h
drivers/gpu/drm/xe/xe_vm.c
drivers/gpu/drm/xe/xe_vm.h
drivers/gpu/drm/xe/xe_vm_types.h
drivers/i2c/busses/i2c-pxa.c
drivers/iommu/amd/init.c
drivers/media/platform/mediatek/vcodec/common/mtk_vcodec_fw_vpu.c
drivers/media/platform/mediatek/vcodec/decoder/mtk_vcodec_dec_drv.c
drivers/media/platform/mediatek/vcodec/decoder/mtk_vcodec_dec_drv.h
drivers/media/platform/mediatek/vcodec/decoder/vdec/vdec_hevc_req_multi_if.c
drivers/media/platform/mediatek/vcodec/decoder/vdec/vdec_vp8_if.c
drivers/media/platform/mediatek/vcodec/decoder/vdec/vdec_vp9_if.c
drivers/media/platform/mediatek/vcodec/decoder/vdec/vdec_vp9_req_lat_if.c
drivers/media/platform/mediatek/vcodec/decoder/vdec_vpu_if.c
drivers/media/platform/mediatek/vcodec/encoder/mtk_vcodec_enc_drv.c
drivers/media/platform/mediatek/vcodec/encoder/mtk_vcodec_enc_drv.h
drivers/media/platform/mediatek/vcodec/encoder/venc_vpu_if.c
drivers/mtd/devices/block2mtd.c
drivers/net/dsa/mt7530.c
drivers/net/dsa/mt7530.h
drivers/net/ethernet/amazon/ena/ena_com.c
drivers/net/ethernet/amazon/ena/ena_netdev.c
drivers/net/ethernet/amazon/ena/ena_xdp.c
drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h
drivers/net/ethernet/mellanox/mlx5/core/en/qos.c
drivers/net/ethernet/mellanox/mlx5/core/en/rqt.c
drivers/net/ethernet/mellanox/mlx5/core/en/rqt.h
drivers/net/ethernet/mellanox/mlx5/core/en/selq.c
drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
drivers/net/ethernet/mellanox/mlx5/core/en_main.c
drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
drivers/net/ethernet/mellanox/mlx5/core/main.c
drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c
drivers/net/ethernet/mellanox/mlx5/core/steering/dr_dbg.c
drivers/net/ethernet/microchip/sparx5/sparx5_port.c
drivers/net/ethernet/realtek/r8169_main.c
drivers/nvme/host/core.c
drivers/nvme/host/fc.c
drivers/nvme/host/nvme.h
drivers/nvme/host/zns.c
drivers/nvme/target/configfs.c
drivers/nvme/target/core.c
drivers/nvme/target/fc.c
drivers/of/dynamic.c
drivers/of/module.c
drivers/perf/riscv_pmu.c
drivers/platform/chrome/cros_ec_uart.c
drivers/platform/x86/acer-wmi.c
drivers/platform/x86/intel/hid.c
drivers/platform/x86/intel/vbtn.c
drivers/platform/x86/lg-laptop.c
drivers/platform/x86/toshiba_acpi.c
drivers/regulator/tps65132-regulator.c
drivers/s390/net/ism_drv.c
drivers/scsi/hisi_sas/hisi_sas_main.c
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
drivers/scsi/libsas/sas_expander.c
drivers/scsi/myrb.c
drivers/scsi/myrs.c
drivers/scsi/qla2xxx/qla_edif.c
drivers/scsi/sd.c
drivers/scsi/sg.c
drivers/spi/spi-fsl-lpspi.c
drivers/spi/spi-pci1xxxx.c
drivers/spi/spi-s3c64xx.c
drivers/target/target_core_configfs.c
drivers/thermal/gov_power_allocator.c
drivers/ufs/core/ufshcd.c
fs/aio.c
fs/bcachefs/acl.c
fs/bcachefs/bcachefs_format.h
fs/bcachefs/btree_gc.c
fs/bcachefs/btree_iter.h
fs/bcachefs/btree_journal_iter.c
fs/bcachefs/btree_key_cache.c
fs/bcachefs/btree_locking.c
fs/bcachefs/btree_node_scan.c
fs/bcachefs/btree_types.h
fs/bcachefs/btree_update_interior.c
fs/bcachefs/btree_update_interior.h
fs/bcachefs/chardev.c
fs/bcachefs/data_update.c
fs/bcachefs/debug.c
fs/bcachefs/eytzinger.c
fs/bcachefs/eytzinger.h
fs/bcachefs/journal_reclaim.c
fs/bcachefs/journal_types.h
fs/bcachefs/recovery.c
fs/bcachefs/snapshot.c
fs/bcachefs/super-io.c
fs/bcachefs/super.c
fs/bcachefs/sysfs.c
fs/bcachefs/tests.c
fs/bcachefs/util.h
fs/btrfs/delayed-inode.c
fs/btrfs/inode.c
fs/btrfs/ioctl.c
fs/btrfs/qgroup.c
fs/btrfs/root-tree.c
fs/btrfs/root-tree.h
fs/btrfs/transaction.c
fs/cramfs/inode.c
fs/ext4/super.c
fs/f2fs/super.c
fs/jfs/jfs_logmgr.c
fs/nfsd/nfs4state.c
fs/proc/bootconfig.c
fs/reiserfs/journal.c
fs/romfs/super.c
fs/smb/client/cached_dir.c
fs/smb/client/cifs_debug.c
fs/smb/client/cifsfs.c
fs/smb/client/cifsglob.h
fs/smb/client/cifsproto.h
fs/smb/client/cifssmb.c
fs/smb/client/connect.c
fs/smb/client/dfs.c
fs/smb/client/dfs.h
fs/smb/client/dfs_cache.c
fs/smb/client/dir.c
fs/smb/client/file.c
fs/smb/client/fs_context.c
fs/smb/client/fs_context.h
fs/smb/client/fscache.h
fs/smb/client/ioctl.c
fs/smb/client/misc.c
fs/smb/client/smb1ops.c
fs/smb/client/smb2misc.c
fs/smb/client/smb2ops.c
fs/smb/client/smb2pdu.c
fs/smb/client/smb2transport.c
fs/smb/server/ksmbd_netlink.h
fs/smb/server/mgmt/share_config.c
fs/smb/server/smb2ops.c
fs/smb/server/smb2pdu.c
fs/smb/server/transport_ipc.c
fs/super.c
fs/xfs/xfs_buf.c
fs/xfs/xfs_inode.c
fs/xfs/xfs_super.c
include/linux/blkdev.h
include/linux/bootconfig.h
include/linux/cc_platform.h
include/linux/compiler.h
include/linux/device.h
include/linux/energy_model.h
include/linux/fs.h
include/linux/gfp_types.h
include/linux/io_uring_types.h
include/linux/mm.h
include/linux/randomize_kstack.h
include/linux/secretmem.h
include/linux/stackdepot.h
include/linux/timecounter.h
include/linux/timekeeping.h
include/linux/timer.h
include/net/bluetooth/bluetooth.h
include/sound/hdaudio_ext.h
include/sound/tas2781-tlv.h
include/vdso/datapage.h
init/initramfs.c
init/main.c
io_uring/io_uring.c
io_uring/kbuf.c
io_uring/kbuf.h
io_uring/rw.c
kernel/kprobes.c
kernel/time/tick-sched.c
kernel/time/tick-sched.h
kernel/time/timer.c
kernel/time/timer_migration.c
lib/stackdepot.c
lib/test_ubsan.c
mm/memory.c
mm/vmalloc.c
net/9p/client.c
net/9p/trans_fd.c
net/bluetooth/hci_request.c
net/bluetooth/hci_sock.c
net/bluetooth/hci_sync.c
net/bluetooth/iso.c
net/bluetooth/l2cap_core.c
net/bluetooth/l2cap_sock.c
net/bluetooth/rfcomm/sock.c
net/bluetooth/sco.c
net/ipv4/netfilter/arp_tables.c
net/ipv4/netfilter/ip_tables.c
net/ipv6/netfilter/ip6_tables.c
net/sunrpc/svcsock.c
net/unix/garbage.c
scripts/gcc-plugins/stackleak_plugin.c
sound/oss/dmasound/dmasound_paula.c
sound/pci/emu10k1/emu10k1_callback.c
sound/pci/hda/cs35l41_hda_property.c
sound/pci/hda/cs35l56_hda_i2c.c
sound/pci/hda/cs35l56_hda_spi.c
sound/pci/hda/patch_realtek.c
sound/soc/amd/acp/acp-pci.c
sound/soc/codecs/cs-amp-lib.c
sound/soc/codecs/cs42l43.c
sound/soc/codecs/es8326.c
sound/soc/codecs/es8326.h
sound/soc/codecs/rt1316-sdw.c
sound/soc/codecs/rt1318-sdw.c
sound/soc/codecs/rt5682-sdw.c
sound/soc/codecs/rt700.c
sound/soc/codecs/rt711-sdca-sdw.c
sound/soc/codecs/rt711-sdca.c
sound/soc/codecs/rt711-sdw.c
sound/soc/codecs/rt711.c
sound/soc/codecs/rt712-sdca-dmic.c
sound/soc/codecs/rt712-sdca-sdw.c
sound/soc/codecs/rt712-sdca.c
sound/soc/codecs/rt715-sdca-sdw.c
sound/soc/codecs/rt715-sdca.c
sound/soc/codecs/rt715-sdw.c
sound/soc/codecs/rt715.c
sound/soc/codecs/rt722-sdca-sdw.c
sound/soc/codecs/rt722-sdca.c
sound/soc/codecs/wm_adsp.c
sound/soc/intel/avs/boards/da7219.c
sound/soc/intel/avs/boards/dmic.c
sound/soc/intel/avs/boards/es8336.c
sound/soc/intel/avs/boards/i2s_test.c
sound/soc/intel/avs/boards/max98357a.c
sound/soc/intel/avs/boards/max98373.c
sound/soc/intel/avs/boards/max98927.c
sound/soc/intel/avs/boards/nau8825.c
sound/soc/intel/avs/boards/probe.c
sound/soc/intel/avs/boards/rt274.c
sound/soc/intel/avs/boards/rt286.c
sound/soc/intel/avs/boards/rt298.c
sound/soc/intel/avs/boards/rt5514.c
sound/soc/intel/avs/boards/rt5663.c
sound/soc/intel/avs/boards/rt5682.c
sound/soc/intel/avs/boards/ssm4567.c
sound/soc/soc-ops.c
sound/soc/sof/amd/acp.c
sound/soc/sof/core.c
sound/soc/sof/intel/hda-common-ops.c
sound/soc/sof/intel/hda-dai-ops.c
sound/soc/sof/intel/hda-dsp.c
sound/soc/sof/intel/hda-pcm.c
sound/soc/sof/intel/hda-stream.c
sound/soc/sof/intel/hda.h
sound/soc/sof/intel/lnl.c
sound/soc/sof/intel/mtl.c
sound/soc/sof/intel/mtl.h
sound/soc/sof/ipc4-mtrace.c
sound/soc/sof/ipc4-pcm.c
sound/soc/sof/ipc4-priv.h
sound/soc/sof/ipc4-topology.c
sound/soc/sof/ops.h
sound/soc/sof/pcm.c
sound/soc/sof/sof-audio.h
sound/soc/sof/sof-priv.h
sound/usb/line6/driver.c
tools/include/linux/kernel.h
tools/include/linux/mm.h
tools/include/linux/panic.h [new file with mode: 0644]
tools/power/x86/turbostat/turbostat.8
tools/power/x86/turbostat/turbostat.c
tools/testing/selftests/mm/vm_util.h
tools/testing/selftests/turbostat/defcolumns.py [new file with mode: 0755]

index 59c9a841bf71494f3371c9658af76aa5644783a5..8284692f9610715fa2bc04f8771cb45d1b5ebf88 100644 (file)
--- a/.mailmap
+++ b/.mailmap
@@ -20,6 +20,7 @@ Adam Oldham <oldhamca@gmail.com>
 Adam Radford <aradford@gmail.com>
 Adriana Reus <adi.reus@gmail.com> <adriana.reus@intel.com>
 Adrian Bunk <bunk@stusta.de>
+Ajay Kaher <ajay.kaher@broadcom.com> <akaher@vmware.com>
 Akhil P Oommen <quic_akhilpo@quicinc.com> <akhilpo@codeaurora.org>
 Alan Cox <alan@lxorguk.ukuu.org.uk>
 Alan Cox <root@hraefn.swansea.linux.org.uk>
@@ -36,6 +37,7 @@ Alexei Avshalom Lazar <quic_ailizaro@quicinc.com> <ailizaro@codeaurora.org>
 Alexei Starovoitov <ast@kernel.org> <alexei.starovoitov@gmail.com>
 Alexei Starovoitov <ast@kernel.org> <ast@fb.com>
 Alexei Starovoitov <ast@kernel.org> <ast@plumgrid.com>
+Alexey Makhalov <alexey.amakhalov@broadcom.com> <amakhalov@vmware.com>
 Alex Hung <alexhung@gmail.com> <alex.hung@canonical.com>
 Alex Shi <alexs@kernel.org> <alex.shi@intel.com>
 Alex Shi <alexs@kernel.org> <alex.shi@linaro.org>
@@ -110,6 +112,7 @@ Brendan Higgins <brendan.higgins@linux.dev> <brendanhiggins@google.com>
 Brian Avery <b.avery@hp.com>
 Brian King <brking@us.ibm.com>
 Brian Silverman <bsilver16384@gmail.com> <brian.silverman@bluerivertech.com>
+Bryan Tan <bryan-bt.tan@broadcom.com> <bryantan@vmware.com>
 Cai Huoqing <cai.huoqing@linux.dev> <caihuoqing@baidu.com>
 Can Guo <quic_cang@quicinc.com> <cang@codeaurora.org>
 Carl Huang <quic_cjhuang@quicinc.com> <cjhuang@codeaurora.org>
@@ -529,6 +532,7 @@ Rocky Liao <quic_rjliao@quicinc.com> <rjliao@codeaurora.org>
 Roman Gushchin <roman.gushchin@linux.dev> <guro@fb.com>
 Roman Gushchin <roman.gushchin@linux.dev> <guroan@gmail.com>
 Roman Gushchin <roman.gushchin@linux.dev> <klamm@yandex-team.ru>
+Ronak Doshi <ronak.doshi@broadcom.com> <doshir@vmware.com>
 Muchun Song <muchun.song@linux.dev> <songmuchun@bytedance.com>
 Muchun Song <muchun.song@linux.dev> <smuchun@gmail.com>
 Ross Zwisler <zwisler@kernel.org> <ross.zwisler@linux.intel.com>
@@ -651,6 +655,7 @@ Viresh Kumar <vireshk@kernel.org> <viresh.kumar@st.com>
 Viresh Kumar <vireshk@kernel.org> <viresh.linux@gmail.com>
 Viresh Kumar <viresh.kumar@linaro.org> <viresh.kumar@linaro.org>
 Viresh Kumar <viresh.kumar@linaro.org> <viresh.kumar@linaro.com>
+Vishnu Dasa <vishnu.dasa@broadcom.com> <vdasa@vmware.com>
 Vivek Aknurwar <quic_viveka@quicinc.com> <viveka@codeaurora.org>
 Vivien Didelot <vivien.didelot@gmail.com> <vivien.didelot@savoirfairelinux.com>
 Vlad Dogaru <ddvlad@gmail.com> <vlad.dogaru@intel.com>
index cce768afec6bed11a961643dcdc2d1ae97848684..b70b1d8bd8e6572374ae10632f46757269f2fa7e 100644 (file)
@@ -138,11 +138,10 @@ associated with the source address of the indirect branch. Specifically,
 the BHB might be shared across privilege levels even in the presence of
 Enhanced IBRS.
 
-Currently the only known real-world BHB attack vector is via
-unprivileged eBPF. Therefore, it's highly recommended to not enable
-unprivileged eBPF, especially when eIBRS is used (without retpolines).
-For a full mitigation against BHB attacks, it's recommended to use
-retpolines (or eIBRS combined with retpolines).
+Previously the only known real-world BHB attack vector was via unprivileged
+eBPF. Further research has found attacks that don't require unprivileged eBPF.
+For a full mitigation against BHB attacks it is recommended to set BHI_DIS_S or
+use the BHB clearing sequence.
 
 Attack scenarios
 ----------------
@@ -430,6 +429,23 @@ The possible values in this file are:
   'PBRSB-eIBRS: Not affected'  CPU is not affected by PBRSB
   ===========================  =======================================================
 
+  - Branch History Injection (BHI) protection status:
+
+.. list-table::
+
+ * - BHI: Not affected
+   - System is not affected
+ * - BHI: Retpoline
+   - System is protected by retpoline
+ * - BHI: BHI_DIS_S
+   - System is protected by BHI_DIS_S
+ * - BHI: SW loop; KVM SW loop
+   - System is protected by software clearing sequence
+ * - BHI: Syscall hardening
+   - Syscalls are hardened against BHI
+ * - BHI: Syscall hardening; KVM: SW loop
+   - System is protected from userspace attacks by syscall hardening; KVM is protected by software clearing sequence
+
 Full mitigation might require a microcode update from the CPU
 vendor. When the necessary microcode is not available, the kernel will
 report vulnerability.
@@ -484,7 +500,11 @@ Spectre variant 2
 
    Systems which support enhanced IBRS (eIBRS) enable IBRS protection once at
    boot, by setting the IBRS bit, and they're automatically protected against
-   Spectre v2 variant attacks.
+   some Spectre v2 variant attacks. The BHB can still influence the choice of
+   indirect branch predictor entry, and although branch predictor entries are
+   isolated between modes when eIBRS is enabled, the BHB itself is not isolated
+   between modes. Systems which support BHI_DIS_S will set it to protect against
+   BHI attacks.
 
    On Intel's enhanced IBRS systems, this includes cross-thread branch target
    injections on SMT systems (STIBP). In other words, Intel eIBRS enables
@@ -638,6 +658,22 @@ kernel command line.
                spectre_v2=off. Spectre variant 1 mitigations
                cannot be disabled.
 
+       spectre_bhi=
+
+               [X86] Control mitigation of Branch History Injection
+               (BHI) vulnerability. Syscalls are hardened against BHI
+               regardless of this setting. This setting affects the deployment
+               of the HW BHI control and the SW BHB clearing sequence.
+
+               on
+                       unconditionally enable.
+               off
+                       unconditionally disable.
+               auto
+                       enable if hardware mitigation
+                       control(BHI_DIS_S) is available, otherwise
+                       enable alternate mitigation in KVM.
+
 For spectre_v2_user see Documentation/admin-guide/kernel-parameters.txt
 
 Mitigation selection guide
index 623fce7d5fcd0c4392432908e21aaba5134e3aa0..70046a019d42d80b1f56d1e80577dfd084fdd5f8 100644 (file)
        sonypi.*=       [HW] Sony Programmable I/O Control Device driver
                        See Documentation/admin-guide/laptops/sonypi.rst
 
+       spectre_bhi=    [X86] Control mitigation of Branch History Injection
+                       (BHI) vulnerability. Syscalls are hardened against BHI
+                       reglardless of this setting. This setting affects the
+                       deployment of the HW BHI control and the SW BHB
+                       clearing sequence.
+
+                       on   - unconditionally enable.
+                       off  - unconditionally disable.
+                       auto - (default) enable hardware mitigation
+                              (BHI_DIS_S) if available, otherwise enable
+                              alternate mitigation in KVM.
+
        spectre_v2=     [X86,EARLY] Control mitigation of Spectre variant 2
                        (indirect branch speculation) vulnerability.
                        The default operation protects the kernel from
index c5aa187026e3a53e1f1638f8808530cc5920df03..43f6fb6c939276dcac480ccbe3e9e30fa58935a3 100644 (file)
@@ -1,5 +1,3 @@
-Status: Unstable - ABI compatibility may be broken in the future
-
 Binding for Keystone gate control driver which uses PSC controller IP.
 
 This binding uses the common clock binding[1].
index 9a3fbc66560652b4fb05033aa7d900b1f8759fe0..69b0eb7c03c9e60d31483305e78095a6ce1c7cb1 100644 (file)
@@ -1,5 +1,3 @@
-Status: Unstable - ABI compatibility may be broken in the future
-
 Binding for keystone PLLs. The main PLL IP typically has a multiplier,
 a divider and a post divider. The additional PLL IPs like ARMPLL, DDRPLL
 and PAPLL are controlled by the memory mapped register where as the Main
index 4c8a2ce2cd70181ead140f3df75472fdc0151bdb..3122360adcf3c0abe3d50f5a2f326427c16c7cfa 100644 (file)
@@ -1,7 +1,5 @@
 Binding for Texas Instruments ADPLL clock.
 
-Binding status: Unstable - ABI compatibility may be broken in the future
-
 This binding uses the common clock binding[1]. It assumes a
 register-mapped ADPLL with two to three selectable input clocks
 and three to four children.
index ade4dd4c30f0e12804a94845b71ee462e30f1d99..bbd505c1199df5b01e9abb2b001c9315c1b2c061 100644 (file)
@@ -1,7 +1,5 @@
 Binding for Texas Instruments APLL clock.
 
-Binding status: Unstable - ABI compatibility may be broken in the future
-
 This binding uses the common clock binding[1].  It assumes a
 register-mapped APLL with usually two selectable input clocks
 (reference clock and bypass clock), with analog phase locked
index 7c735dde9fe971d7ad20f6c6b403581421b2b4c4..05645a10a9e33ce6a6b9bb7b06388442c3a2b85c 100644 (file)
@@ -1,7 +1,5 @@
 Binding for Texas Instruments autoidle clock.
 
-Binding status: Unstable - ABI compatibility may be broken in the future
-
 This binding uses the common clock binding[1]. It assumes a register mapped
 clock which can be put to idle automatically by hardware based on the usage
 and a configuration bit setting. Autoidle clock is never an individual
index 9c6199249ce596cd4c1d4144b08b1b1868b968fa..edf0b5d427682421f4475f09e41d84c2af03ef2c 100644 (file)
@@ -1,7 +1,5 @@
 Binding for Texas Instruments clockdomain.
 
-Binding status: Unstable - ABI compatibility may be broken in the future
-
 This binding uses the common clock binding[1] in consumer role.
 Every clock on TI SoC belongs to one clockdomain, but software
 only needs this information for specific clocks which require
index 33ac7c9ad053c7d13e93d3142f86357a76476ea5..6f7e1331b5466cfa63d25377b64e76354aad5b5d 100644 (file)
@@ -1,7 +1,5 @@
 Binding for TI composite clock.
 
-Binding status: Unstable - ABI compatibility may be broken in the future
-
 This binding uses the common clock binding[1]. It assumes a
 register-mapped composite clock with multiple different sub-types;
 
index 9b13b32974f9926874d1a893fa00be961c5aef4d..4d7c76f0b356950194a5f1184eb1ceb1a1a54ca0 100644 (file)
@@ -1,7 +1,5 @@
 Binding for TI divider clock
 
-Binding status: Unstable - ABI compatibility may be broken in the future
-
 This binding uses the common clock binding[1].  It assumes a
 register-mapped adjustable clock rate divider that does not gate and has
 only one input clock or parent.  By default the value programmed into
index 37a7cb6ad07d873fec2b9be8bad19a4ebea7bcc2..14a1b72c2e712016d97fc7632a7767bc562207e9 100644 (file)
@@ -1,7 +1,5 @@
 Binding for Texas Instruments DPLL clock.
 
-Binding status: Unstable - ABI compatibility may be broken in the future
-
 This binding uses the common clock binding[1].  It assumes a
 register-mapped DPLL with usually two selectable input clocks
 (reference clock and bypass clock), with digital phase locked
index c19b3f253b8cf7fa31ed962ef076ce6e56681f4c..88986ef39ddd245f637328155e3e6958487652c7 100644 (file)
@@ -1,7 +1,5 @@
 Binding for Texas Instruments FAPLL clock.
 
-Binding status: Unstable - ABI compatibility may be broken in the future
-
 This binding uses the common clock binding[1]. It assumes a
 register-mapped FAPLL with usually two selectable input clocks
 (reference clock and bypass clock), and one or more child
index 518e3c1422762cfd32676fd9aee8a2645b0dec6d..dc69477b6e98eb8e1a37f7d488f1ab5bd58b7971 100644 (file)
@@ -1,7 +1,5 @@
 Binding for TI fixed factor rate clock sources.
 
-Binding status: Unstable - ABI compatibility may be broken in the future
-
 This binding uses the common clock binding[1], and also uses the autoidle
 support from TI autoidle clock [2].
 
index 4982615c01b9cb7fd187828e2c1d03a6c1d59255..a8e0335b006a07b1733a116c7d1ccea34af69891 100644 (file)
@@ -1,7 +1,5 @@
 Binding for Texas Instruments gate clock.
 
-Binding status: Unstable - ABI compatibility may be broken in the future
-
 This binding uses the common clock binding[1]. This clock is
 quite much similar to the basic gate-clock [2], however,
 it supports a number of additional features. If no register
index d3eb5ca92a7fe6e349f974a97f9eeb4c721e5304..85fb1f2d2d286b95b2bdabb6c0b421cdaa3d33c7 100644 (file)
@@ -1,7 +1,5 @@
 Binding for Texas Instruments interface clock.
 
-Binding status: Unstable - ABI compatibility may be broken in the future
-
 This binding uses the common clock binding[1]. This clock is
 quite much similar to the basic gate-clock [2], however,
 it supports a number of additional features, including
index b33f641f104321ff1e7d5f6ef5fc66a6a79f9d76..cd56d3c1c09f3bf8ff6d9aa0c0fc859a2bad76af 100644 (file)
@@ -1,7 +1,5 @@
 Binding for TI mux clock.
 
-Binding status: Unstable - ABI compatibility may be broken in the future
-
 This binding uses the common clock binding[1].  It assumes a
 register-mapped multiplexer with multiple input clock signals or
 parents, one of which can be selected as output.  This clock does not
index a9bdd2b59dcab62b3cf6eac23dbefb44a71d648d..8a68331075a098ab8f0a1fece9525c7a2f7d6ddc 100644 (file)
@@ -144,6 +144,8 @@ Example::
                #dma-cells = <1>;
                clocks = <&clock_controller 0>, <&clock_controller 1>;
                clock-names = "bus", "host";
+               #address-cells = <1>;
+               #size-cells = <1>;
                vendor,custom-property = <2>;
                status = "disabled";
 
index 25f8658e216ff03c46b244fcd7679d0e65cf6797..48a49c516b62cb03a0293464fa09c763d4ec6ad2 100644 (file)
@@ -1,9 +1,6 @@
 TI Davinci DSP devices
 =======================
 
-Binding status: Unstable - Subject to changes for DT representation of clocks
-                          and resets
-
 The TI Davinci family of SoCs usually contains a TI DSP Core sub-system that
 is used to offload some of the processor-intensive tasks or algorithms, for
 achieving various system level goals.
index 397f75909b20588506fdff7d7fdc2d8f07d49ba9..ce1a6505eb5149dedc4ecf5ec975ad2a612663eb 100644 (file)
@@ -51,7 +51,7 @@ properties:
   ranges: true
 
 patternProperties:
-  "^clock-controller@[0-9a-z]+$":
+  "^clock-controller@[0-9a-f]+$":
     $ref: /schemas/clock/fsl,flexspi-clock.yaml#
 
 required:
index 8d088b5fe8236b667c9aa3d7e5e7341bc44b38d1..a6a511b00a1281a36b452ed595a1d376c6531eea 100644 (file)
@@ -41,7 +41,7 @@ properties:
   ranges: true
 
 patternProperties:
-  "^interrupt-controller@[a-z0-9]+$":
+  "^interrupt-controller@[a-f0-9]+$":
     $ref: /schemas/interrupt-controller/fsl,ls-extirq.yaml#
 
 required:
index 7a4a6ab85970d6ebad1b45d7d57468b8de2f6b55..ab8f28993139e5443817207a830d99bfcde25c48 100644 (file)
@@ -60,7 +60,7 @@ properties:
       be implemented in an always-on power domain."
 
 patternProperties:
-  '^frame@[0-9a-z]*$':
+  '^frame@[0-9a-f]+$':
     type: object
     additionalProperties: false
     description: A timer node has up to 8 frame sub-nodes, each with the following properties.
index 10c146424baa1edd24c3e316625c07a35816f7f6..cd3680dc002f961f0bb95164b98e08279a755a41 100644 (file)
@@ -27,10 +27,13 @@ properties:
           - qcom,msm8996-ufshc
           - qcom,msm8998-ufshc
           - qcom,sa8775p-ufshc
+          - qcom,sc7180-ufshc
           - qcom,sc7280-ufshc
+          - qcom,sc8180x-ufshc
           - qcom,sc8280xp-ufshc
           - qcom,sdm845-ufshc
           - qcom,sm6115-ufshc
+          - qcom,sm6125-ufshc
           - qcom,sm6350-ufshc
           - qcom,sm8150-ufshc
           - qcom,sm8250-ufshc
@@ -42,11 +45,11 @@ properties:
       - const: jedec,ufs-2.0
 
   clocks:
-    minItems: 8
+    minItems: 7
     maxItems: 11
 
   clock-names:
-    minItems: 8
+    minItems: 7
     maxItems: 11
 
   dma-coherent: true
@@ -112,6 +115,31 @@ required:
 allOf:
   - $ref: ufs-common.yaml
 
+  - if:
+      properties:
+        compatible:
+          contains:
+            enum:
+              - qcom,sc7180-ufshc
+    then:
+      properties:
+        clocks:
+          minItems: 7
+          maxItems: 7
+        clock-names:
+          items:
+            - const: core_clk
+            - const: bus_aggr_clk
+            - const: iface_clk
+            - const: core_clk_unipro
+            - const: ref_clk
+            - const: tx_lane0_sync_clk
+            - const: rx_lane0_sync_clk
+        reg:
+          maxItems: 1
+        reg-names:
+          maxItems: 1
+
   - if:
       properties:
         compatible:
@@ -120,6 +148,7 @@ allOf:
               - qcom,msm8998-ufshc
               - qcom,sa8775p-ufshc
               - qcom,sc7280-ufshc
+              - qcom,sc8180x-ufshc
               - qcom,sc8280xp-ufshc
               - qcom,sm8250-ufshc
               - qcom,sm8350-ufshc
@@ -215,6 +244,7 @@ allOf:
           contains:
             enum:
               - qcom,sm6115-ufshc
+              - qcom,sm6125-ufshc
     then:
       properties:
         clocks:
@@ -248,7 +278,7 @@ allOf:
         reg:
           maxItems: 1
         clocks:
-          minItems: 8
+          minItems: 7
           maxItems: 8
     else:
       properties:
@@ -256,7 +286,7 @@ allOf:
           minItems: 1
           maxItems: 2
         clocks:
-          minItems: 8
+          minItems: 7
           maxItems: 11
 
 unevaluatedProperties: false
diff --git a/Documentation/filesystems/bcachefs/index.rst b/Documentation/filesystems/bcachefs/index.rst
new file mode 100644 (file)
index 0000000..e2bd61c
--- /dev/null
@@ -0,0 +1,11 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+======================
+bcachefs Documentation
+======================
+
+.. toctree::
+   :maxdepth: 2
+   :numbered:
+
+   errorcodes
index 0ea1e44fa02823ffd51f4739a3a9aab635a35bbe..1f9b4c905a6a7c0646fca9764829151582eb6e7c 100644 (file)
@@ -69,6 +69,7 @@ Documentation for filesystem implementations.
    afs
    autofs
    autofs-mount-control
+   bcachefs/index
    befs
    bfs
    btrfs
index f389bf6d78a3cb312b0ac87d880acacb868c27e3..b5b89687680b98eace9cb453ec2690bad735a6f2 100644 (file)
@@ -3572,6 +3572,7 @@ S:        Supported
 C:     irc://irc.oftc.net/bcache
 T:     git https://evilpiepirate.org/git/bcachefs.git
 F:     fs/bcachefs/
+F:     Documentation/filesystems/bcachefs/
 
 BDISP ST MEDIA DRIVER
 M:     Fabien Dessenne <fabien.dessenne@foss.st.com>
@@ -16728,9 +16729,9 @@ F:      include/uapi/linux/ppdev.h
 
 PARAVIRT_OPS INTERFACE
 M:     Juergen Gross <jgross@suse.com>
-R:     Ajay Kaher <akaher@vmware.com>
-R:     Alexey Makhalov <amakhalov@vmware.com>
-R:     VMware PV-Drivers Reviewers <pv-drivers@vmware.com>
+R:     Ajay Kaher <ajay.kaher@broadcom.com>
+R:     Alexey Makhalov <alexey.amakhalov@broadcom.com>
+R:     Broadcom internal kernel review list <bcm-kernel-feedback-list@broadcom.com>
 L:     virtualization@lists.linux.dev
 L:     x86@kernel.org
 S:     Supported
@@ -22425,6 +22426,7 @@ S:      Maintained
 W:     https://kernsec.org/wiki/index.php/Linux_Kernel_Integrity
 Q:     https://patchwork.kernel.org/project/linux-integrity/list/
 T:     git git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd.git
+F:     Documentation/devicetree/bindings/tpm/
 F:     drivers/char/tpm/
 
 TPS546D24 DRIVER
@@ -22571,6 +22573,7 @@ Q:      https://patchwork.kernel.org/project/linux-pm/list/
 B:     https://bugzilla.kernel.org
 T:     git git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux.git turbostat
 F:     tools/power/x86/turbostat/
+F:     tools/testing/selftests/turbostat/
 
 TW5864 VIDEO4LINUX DRIVER
 M:     Bluecherry Maintainers <maintainers@bluecherrydvr.com>
@@ -23649,9 +23652,9 @@ S:      Supported
 F:     drivers/misc/vmw_balloon.c
 
 VMWARE HYPERVISOR INTERFACE
-M:     Ajay Kaher <akaher@vmware.com>
-M:     Alexey Makhalov <amakhalov@vmware.com>
-R:     VMware PV-Drivers Reviewers <pv-drivers@vmware.com>
+M:     Ajay Kaher <ajay.kaher@broadcom.com>
+M:     Alexey Makhalov <alexey.amakhalov@broadcom.com>
+R:     Broadcom internal kernel review list <bcm-kernel-feedback-list@broadcom.com>
 L:     virtualization@lists.linux.dev
 L:     x86@kernel.org
 S:     Supported
@@ -23660,33 +23663,34 @@ F:    arch/x86/include/asm/vmware.h
 F:     arch/x86/kernel/cpu/vmware.c
 
 VMWARE PVRDMA DRIVER
-M:     Bryan Tan <bryantan@vmware.com>
-M:     Vishnu Dasa <vdasa@vmware.com>
-R:     VMware PV-Drivers Reviewers <pv-drivers@vmware.com>
+M:     Bryan Tan <bryan-bt.tan@broadcom.com>
+M:     Vishnu Dasa <vishnu.dasa@broadcom.com>
+R:     Broadcom internal kernel review list <bcm-kernel-feedback-list@broadcom.com>
 L:     linux-rdma@vger.kernel.org
 S:     Supported
 F:     drivers/infiniband/hw/vmw_pvrdma/
 
 VMWARE PVSCSI DRIVER
-M:     Vishal Bhakta <vbhakta@vmware.com>
-R:     VMware PV-Drivers Reviewers <pv-drivers@vmware.com>
+M:     Vishal Bhakta <vishal.bhakta@broadcom.com>
+R:     Broadcom internal kernel review list <bcm-kernel-feedback-list@broadcom.com>
 L:     linux-scsi@vger.kernel.org
 S:     Supported
 F:     drivers/scsi/vmw_pvscsi.c
 F:     drivers/scsi/vmw_pvscsi.h
 
 VMWARE VIRTUAL PTP CLOCK DRIVER
-R:     Ajay Kaher <akaher@vmware.com>
-R:     Alexey Makhalov <amakhalov@vmware.com>
-R:     VMware PV-Drivers Reviewers <pv-drivers@vmware.com>
+M:     Nick Shi <nick.shi@broadcom.com>
+R:     Ajay Kaher <ajay.kaher@broadcom.com>
+R:     Alexey Makhalov <alexey.amakhalov@broadcom.com>
+R:     Broadcom internal kernel review list <bcm-kernel-feedback-list@broadcom.com>
 L:     netdev@vger.kernel.org
 S:     Supported
 F:     drivers/ptp/ptp_vmw.c
 
 VMWARE VMCI DRIVER
-M:     Bryan Tan <bryantan@vmware.com>
-M:     Vishnu Dasa <vdasa@vmware.com>
-R:     VMware PV-Drivers Reviewers <pv-drivers@vmware.com>
+M:     Bryan Tan <bryan-bt.tan@broadcom.com>
+M:     Vishnu Dasa <vishnu.dasa@broadcom.com>
+R:     Broadcom internal kernel review list <bcm-kernel-feedback-list@broadcom.com>
 L:     linux-kernel@vger.kernel.org
 S:     Supported
 F:     drivers/misc/vmw_vmci/
@@ -23701,16 +23705,16 @@ F:    drivers/input/mouse/vmmouse.c
 F:     drivers/input/mouse/vmmouse.h
 
 VMWARE VMXNET3 ETHERNET DRIVER
-M:     Ronak Doshi <doshir@vmware.com>
-R:     VMware PV-Drivers Reviewers <pv-drivers@vmware.com>
+M:     Ronak Doshi <ronak.doshi@broadcom.com>
+R:     Broadcom internal kernel review list <bcm-kernel-feedback-list@broadcom.com>
 L:     netdev@vger.kernel.org
 S:     Supported
 F:     drivers/net/vmxnet3/
 
 VMWARE VSOCK VMCI TRANSPORT DRIVER
-M:     Bryan Tan <bryantan@vmware.com>
-M:     Vishnu Dasa <vdasa@vmware.com>
-R:     VMware PV-Drivers Reviewers <pv-drivers@vmware.com>
+M:     Bryan Tan <bryan-bt.tan@broadcom.com>
+M:     Vishnu Dasa <vishnu.dasa@broadcom.com>
+R:     Broadcom internal kernel review list <bcm-kernel-feedback-list@broadcom.com>
 L:     linux-kernel@vger.kernel.org
 S:     Supported
 F:     net/vmw_vsock/vmci_transport*
index 4bef6323c47decb8805896b3a017213ef461c44e..e1bf12891cb0e4a7471d60cf4b2eb0050d600d2f 100644 (file)
--- a/Makefile
+++ b/Makefile
@@ -2,7 +2,7 @@
 VERSION = 6
 PATCHLEVEL = 9
 SUBLEVEL = 0
-EXTRAVERSION = -rc2
+EXTRAVERSION = -rc3
 NAME = Hurr durr I'ma ninja sloth
 
 # *DOCUMENTATION*
index 162b030ab9da33fd089bcdf269dde014ead7e712..0d022599eb61b38183ffd34eae2bfa9c2f643a56 100644 (file)
@@ -761,7 +761,6 @@ static void sve_init_header_from_task(struct user_sve_header *header,
 {
        unsigned int vq;
        bool active;
-       bool fpsimd_only;
        enum vec_type task_type;
 
        memset(header, 0, sizeof(*header));
@@ -777,12 +776,10 @@ static void sve_init_header_from_task(struct user_sve_header *header,
        case ARM64_VEC_SVE:
                if (test_tsk_thread_flag(target, TIF_SVE_VL_INHERIT))
                        header->flags |= SVE_PT_VL_INHERIT;
-               fpsimd_only = !test_tsk_thread_flag(target, TIF_SVE);
                break;
        case ARM64_VEC_SME:
                if (test_tsk_thread_flag(target, TIF_SME_VL_INHERIT))
                        header->flags |= SVE_PT_VL_INHERIT;
-               fpsimd_only = false;
                break;
        default:
                WARN_ON_ONCE(1);
@@ -790,7 +787,7 @@ static void sve_init_header_from_task(struct user_sve_header *header,
        }
 
        if (active) {
-               if (fpsimd_only) {
+               if (target->thread.fp_type == FP_STATE_FPSIMD) {
                        header->flags |= SVE_PT_REGS_FPSIMD;
                } else {
                        header->flags |= SVE_PT_REGS_SVE;
index 49a70f8c3cab22b758dd290a9fab2374a62abae9..b6aeb1f70e2a038ac2eb3bfe6c402bd37b4dcd6a 100644 (file)
                #size-cells = <2>;
                dma-coherent;
 
+               isa@18000000 {
+                       compatible = "isa";
+                       #size-cells = <1>;
+                       #address-cells = <2>;
+                       ranges = <1 0x0 0x0 0x18000000 0x4000>;
+               };
+
                liointc0: interrupt-controller@1fe01400 {
                        compatible = "loongson,liointc-2.0";
                        reg = <0x0 0x1fe01400 0x0 0x40>,
index dca91caf895e3cd9e428e75b91da9392bfb49d82..74b99bd234cc38df9a087915280e86ddb5bd56d4 100644 (file)
 
 &gmac0 {
        status = "okay";
+
+       phy-mode = "gmii";
+       phy-handle = <&phy0>;
+       mdio {
+               compatible = "snps,dwmac-mdio";
+               #address-cells = <1>;
+               #size-cells = <0>;
+               phy0: ethernet-phy@0 {
+                       reg = <2>;
+               };
+       };
 };
 
 &gmac1 {
        status = "okay";
+
+       phy-mode = "gmii";
+       phy-handle = <&phy1>;
+       mdio {
+               compatible = "snps,dwmac-mdio";
+               #address-cells = <1>;
+               #size-cells = <0>;
+               phy1: ethernet-phy@1 {
+                       reg = <2>;
+               };
+       };
 };
 
 &gmac2 {
        status = "okay";
+
+       phy-mode = "rgmii";
+       phy-handle = <&phy2>;
+       mdio {
+               compatible = "snps,dwmac-mdio";
+               #address-cells = <1>;
+               #size-cells = <0>;
+               phy2: ethernet-phy@2 {
+                       reg = <0>;
+               };
+       };
 };
index a231949b5f553a3814f48f6875e65ac2ed73d09a..9eab2d02cbe8bff12a26ce11dd7ac1543b7c1f82 100644 (file)
                #address-cells = <2>;
                #size-cells = <2>;
 
+               isa@18400000 {
+                       compatible = "isa";
+                       #size-cells = <1>;
+                       #address-cells = <2>;
+                       ranges = <1 0x0 0x0 0x18400000 0x4000>;
+               };
+
                pmc: power-management@100d0000 {
                        compatible = "loongson,ls2k2000-pmc", "loongson,ls2k0500-pmc", "syscon";
                        reg = <0x0 0x100d0000 0x0 0x58>;
                msi: msi-controller@1fe01140 {
                        compatible = "loongson,pch-msi-1.0";
                        reg = <0x0 0x1fe01140 0x0 0x8>;
+                       interrupt-controller;
+                       #interrupt-cells = <1>;
                        msi-controller;
                        loongson,msi-base-vec = <64>;
                        loongson,msi-num-vecs = <192>;
                        #address-cells = <3>;
                        #size-cells = <2>;
                        device_type = "pci";
+                       msi-parent = <&msi>;
                        bus-range = <0x0 0xff>;
-                       ranges = <0x01000000 0x0 0x00008000 0x0 0x18400000 0x0 0x00008000>,
+                       ranges = <0x01000000 0x0 0x00008000 0x0 0x18408000 0x0 0x00008000>,
                                 <0x02000000 0x0 0x60000000 0x0 0x60000000 0x0 0x20000000>;
 
                        gmac0: ethernet@3,0 {
                                reg = <0x1800 0x0 0x0 0x0 0x0>;
-                               interrupts = <12 IRQ_TYPE_LEVEL_HIGH>;
+                               interrupts = <12 IRQ_TYPE_LEVEL_HIGH>,
+                                            <13 IRQ_TYPE_LEVEL_HIGH>;
+                               interrupt-names = "macirq", "eth_lpi";
                                interrupt-parent = <&pic>;
                                status = "disabled";
                        };
 
                        gmac1: ethernet@3,1 {
                                reg = <0x1900 0x0 0x0 0x0 0x0>;
-                               interrupts = <14 IRQ_TYPE_LEVEL_HIGH>;
+                               interrupts = <14 IRQ_TYPE_LEVEL_HIGH>,
+                                            <15 IRQ_TYPE_LEVEL_HIGH>;
+                               interrupt-names = "macirq", "eth_lpi";
                                interrupt-parent = <&pic>;
                                status = "disabled";
                        };
 
                        gmac2: ethernet@3,2 {
                                reg = <0x1a00 0x0 0x0 0x0 0x0>;
-                               interrupts = <17 IRQ_TYPE_LEVEL_HIGH>;
+                               interrupts = <17 IRQ_TYPE_LEVEL_HIGH>,
+                                            <18 IRQ_TYPE_LEVEL_HIGH>;
+                               interrupt-names = "macirq", "eth_lpi";
                                interrupt-parent = <&pic>;
                                status = "disabled";
                        };
index b24437e28c6eda457b2be003b51ad3809600f7cc..7bd47d65bf7a048fda5183ed6844d5dbc129b232 100644 (file)
@@ -11,6 +11,7 @@
 #define _ASM_ADDRSPACE_H
 
 #include <linux/const.h>
+#include <linux/sizes.h>
 
 #include <asm/loongarch.h>
 
index 4a8adcca329b81e4f289dd7825fb15dbf2f4f7a9..c2f9979b2979e5e92e791e3f8304975db9e929c9 100644 (file)
 #include <asm/pgtable-bits.h>
 #include <asm/string.h>
 
-/*
- * Change "struct page" to physical address.
- */
-#define page_to_phys(page)     ((phys_addr_t)page_to_pfn(page) << PAGE_SHIFT)
-
 extern void __init __iomem *early_ioremap(u64 phys_addr, unsigned long size);
 extern void __init early_iounmap(void __iomem *addr, unsigned long size);
 
@@ -73,6 +68,21 @@ extern void __memcpy_fromio(void *to, const volatile void __iomem *from, size_t
 
 #define __io_aw() mmiowb()
 
+#ifdef CONFIG_KFENCE
+#define virt_to_phys(kaddr)                                                            \
+({                                                                                     \
+       (likely((unsigned long)kaddr < vm_map_base)) ? __pa((unsigned long)kaddr) :     \
+       page_to_phys(tlb_virt_to_page((unsigned long)kaddr)) + offset_in_page((unsigned long)kaddr);\
+})
+
+#define phys_to_virt(paddr)                                                            \
+({                                                                                     \
+       extern char *__kfence_pool;                                                     \
+       (unlikely(__kfence_pool == NULL)) ? __va((unsigned long)paddr) :                \
+       page_address(phys_to_page((unsigned long)paddr)) + offset_in_page((unsigned long)paddr);\
+})
+#endif
+
 #include <asm-generic/io.h>
 
 #define ARCH_HAS_VALID_PHYS_ADDR_RANGE
index 6c82aea1c99398c46484a77cc28da1316799affb..a6a5760da3a3323641e3fa422f3da87cdb4b66f8 100644 (file)
@@ -16,6 +16,7 @@
 static inline bool arch_kfence_init_pool(void)
 {
        int err;
+       char *kaddr, *vaddr;
        char *kfence_pool = __kfence_pool;
        struct vm_struct *area;
 
@@ -35,6 +36,14 @@ static inline bool arch_kfence_init_pool(void)
                return false;
        }
 
+       kaddr = kfence_pool;
+       vaddr = __kfence_pool;
+       while (kaddr < kfence_pool + KFENCE_POOL_SIZE) {
+               set_page_address(virt_to_page(kaddr), vaddr);
+               kaddr += PAGE_SIZE;
+               vaddr += PAGE_SIZE;
+       }
+
        return true;
 }
 
index 44027060c54a28bd34a80f538135491e3ebc758a..e85df33f11c77212c2e8ec8e6b3f1dbb955bc622 100644 (file)
@@ -78,7 +78,26 @@ typedef struct { unsigned long pgprot; } pgprot_t;
 struct page *dmw_virt_to_page(unsigned long kaddr);
 struct page *tlb_virt_to_page(unsigned long kaddr);
 
-#define virt_to_pfn(kaddr)     PFN_DOWN(PHYSADDR(kaddr))
+#define pfn_to_phys(pfn)       __pfn_to_phys(pfn)
+#define phys_to_pfn(paddr)     __phys_to_pfn(paddr)
+
+#define page_to_phys(page)     pfn_to_phys(page_to_pfn(page))
+#define phys_to_page(paddr)    pfn_to_page(phys_to_pfn(paddr))
+
+#ifndef CONFIG_KFENCE
+
+#define page_to_virt(page)     __va(page_to_phys(page))
+#define virt_to_page(kaddr)    phys_to_page(__pa(kaddr))
+
+#else
+
+#define WANT_PAGE_VIRTUAL
+
+#define page_to_virt(page)                                                             \
+({                                                                                     \
+       extern char *__kfence_pool;                                                     \
+       (__kfence_pool == NULL) ? __va(page_to_phys(page)) : page_address(page);        \
+})
 
 #define virt_to_page(kaddr)                                                            \
 ({                                                                                     \
@@ -86,6 +105,11 @@ struct page *tlb_virt_to_page(unsigned long kaddr);
        dmw_virt_to_page((unsigned long)kaddr) : tlb_virt_to_page((unsigned long)kaddr);\
 })
 
+#endif
+
+#define pfn_to_virt(pfn)       page_to_virt(pfn_to_page(pfn))
+#define virt_to_pfn(kaddr)     page_to_pfn(virt_to_page(kaddr))
+
 extern int __virt_addr_valid(volatile void *kaddr);
 #define virt_addr_valid(kaddr) __virt_addr_valid((volatile void *)(kaddr))
 
index a9630a81b38abbfc575ea4174af049ccd5a9a888..89af7c12e8c08d4faab2919cf22034b5ab0f5a6b 100644 (file)
@@ -4,6 +4,7 @@
  */
 #include <linux/export.h>
 #include <linux/io.h>
+#include <linux/kfence.h>
 #include <linux/memblock.h>
 #include <linux/mm.h>
 #include <linux/mman.h>
@@ -111,6 +112,9 @@ int __virt_addr_valid(volatile void *kaddr)
 {
        unsigned long vaddr = (unsigned long)kaddr;
 
+       if (is_kfence_address((void *)kaddr))
+               return 1;
+
        if ((vaddr < PAGE_OFFSET) || (vaddr >= vm_map_base))
                return 0;
 
index 2aae72e638713a658475e6fb82fc73eae0fc3469..bda018150000e66b906420ea7e3a5f79472ca352 100644 (file)
 
 struct page *dmw_virt_to_page(unsigned long kaddr)
 {
-       return pfn_to_page(virt_to_pfn(kaddr));
+       return phys_to_page(__pa(kaddr));
 }
 EXPORT_SYMBOL(dmw_virt_to_page);
 
 struct page *tlb_virt_to_page(unsigned long kaddr)
 {
-       return pfn_to_page(pte_pfn(*virt_to_kpte(kaddr)));
+       return phys_to_page(pfn_to_phys(pte_pfn(*virt_to_kpte(kaddr))));
 }
 EXPORT_SYMBOL(tlb_virt_to_page);
 
index 8d98af5c7201bb34570d97876d53d663e9068964..9a8393e6b4a85ecdb22691720c6a266eb5d7aa2d 100644 (file)
@@ -21,7 +21,8 @@
 
 void __init early_init_devtree(void *params)
 {
-       __be32 *dtb = (u32 *)__dtb_start;
+       __be32 __maybe_unused *dtb = (u32 *)__dtb_start;
+
 #if defined(CONFIG_NIOS2_DTB_AT_PHYS_ADDR)
        if (be32_to_cpup((__be32 *)CONFIG_NIOS2_DTB_PHYS_ADDR) ==
                 OF_DT_HEADER) {
@@ -30,8 +31,11 @@ void __init early_init_devtree(void *params)
                return;
        }
 #endif
+
+#ifdef CONFIG_NIOS2_DTB_SOURCE_BOOL
        if (be32_to_cpu((__be32) *dtb) == OF_DT_HEADER)
                params = (void *)__dtb_start;
+#endif
 
        early_init_dt_scan(params);
 }
index f0a4cf01e85c0312ee4b1350c0024f975a0b8120..78302f6c258006471bb4ba3dfb1f186c0137ef66 100644 (file)
@@ -4,7 +4,6 @@
 
 #ifndef __ASSEMBLY__
 
-#include <asm/page.h>
 #include <asm/vdso/timebase.h>
 #include <asm/barrier.h>
 #include <asm/unistd.h>
@@ -95,7 +94,7 @@ const struct vdso_data *__arch_get_vdso_data(void);
 static __always_inline
 const struct vdso_data *__arch_get_timens_vdso_data(const struct vdso_data *vd)
 {
-       return (void *)vd + PAGE_SIZE;
+       return (void *)vd + (1U << CONFIG_PAGE_SHIFT);
 }
 #endif
 
index 252d63942f34ebe08a3087d12bee3a1c4833f15a..5b3115a198522684cbcba474953e07d4e76e9ff5 100644 (file)
@@ -151,7 +151,7 @@ endif
 endif
 
 vdso-install-y                 += arch/riscv/kernel/vdso/vdso.so.dbg
-vdso-install-$(CONFIG_COMPAT)  += arch/riscv/kernel/compat_vdso/compat_vdso.so.dbg:../compat_vdso/compat_vdso.so
+vdso-install-$(CONFIG_COMPAT)  += arch/riscv/kernel/compat_vdso/compat_vdso.so.dbg
 
 ifneq ($(CONFIG_XIP_KERNEL),y)
 ifeq ($(CONFIG_RISCV_M_MODE)$(CONFIG_ARCH_CANAAN),yy)
index 97fcde30e2477d55f8046d844191a1e4b0ba5e1a..9f8ea0e33eb10424c5a05eb55849eacce627c3c3 100644 (file)
@@ -593,6 +593,12 @@ static inline int ptep_clear_flush_young(struct vm_area_struct *vma,
        return ptep_test_and_clear_young(vma, address, ptep);
 }
 
+#define pgprot_nx pgprot_nx
+static inline pgprot_t pgprot_nx(pgprot_t _prot)
+{
+       return __pgprot(pgprot_val(_prot) & ~_PAGE_EXEC);
+}
+
 #define pgprot_noncached pgprot_noncached
 static inline pgprot_t pgprot_noncached(pgprot_t _prot)
 {
index 980094c2e9761d19b562d4dd8f9f707fd13480ae..ac80216549ffa6fce76ffe1759ad4e0da4609f9f 100644 (file)
@@ -36,7 +36,8 @@ asmlinkage long __riscv_sys_ni_syscall(const struct pt_regs *);
                                        ulong)                                          \
                        __attribute__((alias(__stringify(___se_##prefix##name))));      \
        __diag_pop();                                                                   \
-       static long noinline ___se_##prefix##name(__MAP(x,__SC_LONG,__VA_ARGS__));      \
+       static long noinline ___se_##prefix##name(__MAP(x,__SC_LONG,__VA_ARGS__))       \
+                       __used;                                                         \
        static long ___se_##prefix##name(__MAP(x,__SC_LONG,__VA_ARGS__))
 
 #define SC_RISCV_REGS_TO_ARGS(x, ...) \
index ec0cab9fbddd0da98cb415af2732a4ede083886b..72ec1d9bd3f312ec05c6dc5f2342f06e24c58468 100644 (file)
@@ -319,7 +319,7 @@ unsigned long __must_check clear_user(void __user *to, unsigned long n)
 
 #define __get_kernel_nofault(dst, src, type, err_label)                        \
 do {                                                                   \
-       long __kr_err;                                                  \
+       long __kr_err = 0;                                              \
                                                                        \
        __get_user_nocheck(*((type *)(dst)), (type *)(src), __kr_err);  \
        if (unlikely(__kr_err))                                         \
@@ -328,7 +328,7 @@ do {                                                                        \
 
 #define __put_kernel_nofault(dst, src, type, err_label)                        \
 do {                                                                   \
-       long __kr_err;                                                  \
+       long __kr_err = 0;                                              \
                                                                        \
        __put_user_nocheck(*((type *)(src)), (type *)(dst), __kr_err);  \
        if (unlikely(__kr_err))                                         \
index 10aaa83db89ef74a6441f5782698dc82d7e0ee5c..95050ebe9ad00bce67e4a8e42611624a40734c41 100644 (file)
@@ -34,7 +34,7 @@
 #define AT_L3_CACHEGEOMETRY    47
 
 /* entries in ARCH_DLINFO */
-#define AT_VECTOR_SIZE_ARCH    9
+#define AT_VECTOR_SIZE_ARCH    10
 #define AT_MINSIGSTKSZ         51
 
 #endif /* _UAPI_ASM_RISCV_AUXVEC_H */
index 62fa393b2eb2ead77a85ce54a9c4c8b32d528f6b..3df4cb788c1fa459d629d629ef64f99c81891f6b 100644 (file)
@@ -74,5 +74,5 @@ quiet_cmd_compat_vdsold = VDSOLD  $@
                    rm $@.tmp
 
 # actual build commands
-quiet_cmd_compat_vdsoas = VDSOAS $@
+quiet_cmd_compat_vdsoas = VDSOAS  $@
       cmd_compat_vdsoas = $(COMPAT_CC) $(a_flags) $(COMPAT_CC_FLAGS) -c -o $@ $<
index 37e87fdcf6a00057663ffd636c50e85e865be3c4..30e12b310cab7397f91d622f8d4c20117e5e8c5f 100644 (file)
@@ -80,6 +80,8 @@ static int __patch_insn_set(void *addr, u8 c, size_t len)
         */
        lockdep_assert_held(&text_mutex);
 
+       preempt_disable();
+
        if (across_pages)
                patch_map(addr + PAGE_SIZE, FIX_TEXT_POKE1);
 
@@ -92,6 +94,8 @@ static int __patch_insn_set(void *addr, u8 c, size_t len)
        if (across_pages)
                patch_unmap(FIX_TEXT_POKE1);
 
+       preempt_enable();
+
        return 0;
 }
 NOKPROBE_SYMBOL(__patch_insn_set);
@@ -122,6 +126,8 @@ static int __patch_insn_write(void *addr, const void *insn, size_t len)
        if (!riscv_patch_in_stop_machine)
                lockdep_assert_held(&text_mutex);
 
+       preempt_disable();
+
        if (across_pages)
                patch_map(addr + PAGE_SIZE, FIX_TEXT_POKE1);
 
@@ -134,6 +140,8 @@ static int __patch_insn_write(void *addr, const void *insn, size_t len)
        if (across_pages)
                patch_unmap(FIX_TEXT_POKE1);
 
+       preempt_enable();
+
        return ret;
 }
 NOKPROBE_SYMBOL(__patch_insn_write);
index 92922dbd5b5c1f9b5d57643ecbd7a1599c5ac4c3..e4bc61c4e58af9c6c3914692c240021d053d72d8 100644 (file)
@@ -27,8 +27,6 @@
 #include <asm/vector.h>
 #include <asm/cpufeature.h>
 
-register unsigned long gp_in_global __asm__("gp");
-
 #if defined(CONFIG_STACKPROTECTOR) && !defined(CONFIG_STACKPROTECTOR_PER_TASK)
 #include <linux/stackprotector.h>
 unsigned long __stack_chk_guard __read_mostly;
@@ -37,7 +35,7 @@ EXPORT_SYMBOL(__stack_chk_guard);
 
 extern asmlinkage void ret_from_fork(void);
 
-void arch_cpu_idle(void)
+void noinstr arch_cpu_idle(void)
 {
        cpu_do_idle();
 }
@@ -207,7 +205,6 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
        if (unlikely(args->fn)) {
                /* Kernel thread */
                memset(childregs, 0, sizeof(struct pt_regs));
-               childregs->gp = gp_in_global;
                /* Supervisor/Machine, irqs on: */
                childregs->status = SR_PP | SR_PIE;
 
index 501e66debf69721d53db2515cea4df970a6b2784..5a2edd7f027e5d12e682349c3f1b54b51cd3b735 100644 (file)
@@ -119,6 +119,13 @@ static long __restore_v_state(struct pt_regs *regs, void __user *sc_vec)
        struct __sc_riscv_v_state __user *state = sc_vec;
        void __user *datap;
 
+       /*
+        * Mark the vstate as clean prior performing the actual copy,
+        * to avoid getting the vstate incorrectly clobbered by the
+        *  discarded vector state.
+        */
+       riscv_v_vstate_set_restore(current, regs);
+
        /* Copy everything of __sc_riscv_v_state except datap. */
        err = __copy_from_user(&current->thread.vstate, &state->v_state,
                               offsetof(struct __riscv_v_ext_state, datap));
@@ -133,13 +140,7 @@ static long __restore_v_state(struct pt_regs *regs, void __user *sc_vec)
         * Copy the whole vector content from user space datap. Use
         * copy_from_user to prevent information leak.
         */
-       err = copy_from_user(current->thread.vstate.datap, datap, riscv_v_vsize);
-       if (unlikely(err))
-               return err;
-
-       riscv_v_vstate_set_restore(current, regs);
-
-       return err;
+       return copy_from_user(current->thread.vstate.datap, datap, riscv_v_vsize);
 }
 #else
 #define save_v_state(task, regs) (0)
index 868d6280cf667e655de2d5003c2fd57d129b3127..05a16b1f0aee858f3abf7a28c647ad1146410da0 100644 (file)
@@ -122,7 +122,7 @@ void do_trap(struct pt_regs *regs, int signo, int code, unsigned long addr)
                print_vma_addr(KERN_CONT " in ", instruction_pointer(regs));
                pr_cont("\n");
                __show_regs(regs);
-               dump_instr(KERN_EMERG, regs);
+               dump_instr(KERN_INFO, regs);
        }
 
        force_sig_fault(signo, code, (void __user *)addr);
index 9b517fe1b8a8ecfddfae487dc9e829cc622334f2..272c431ac5b9f82c8181b673afcf236d85641feb 100644 (file)
@@ -37,6 +37,7 @@ endif
 
 # Disable -pg to prevent insert call site
 CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) $(CC_FLAGS_SCS)
+CFLAGS_REMOVE_hwprobe.o = $(CC_FLAGS_FTRACE) $(CC_FLAGS_SCS)
 
 # Disable profiling and instrumentation for VDSO code
 GCOV_PROFILE := n
index 893566e004b73fcf9a8dbc94f766e59cd00f1bb1..07d743f87b3f69f2e88e716fc7b7d4b064fe9c3e 100644 (file)
@@ -99,7 +99,7 @@ static void __ipi_flush_tlb_range_asid(void *info)
        local_flush_tlb_range_asid(d->start, d->size, d->stride, d->asid);
 }
 
-static void __flush_tlb_range(struct cpumask *cmask, unsigned long asid,
+static void __flush_tlb_range(const struct cpumask *cmask, unsigned long asid,
                              unsigned long start, unsigned long size,
                              unsigned long stride)
 {
@@ -200,7 +200,7 @@ void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
 
 void flush_tlb_kernel_range(unsigned long start, unsigned long end)
 {
-       __flush_tlb_range((struct cpumask *)cpu_online_mask, FLUSH_TLB_NO_ASID,
+       __flush_tlb_range(cpu_online_mask, FLUSH_TLB_NO_ASID,
                          start, end - start, PAGE_SIZE);
 }
 
index 7138d189cc420a2b4ca87b780503e7f4d53c9d7a..0c4cad7d5a5b1199c900f339e7872dbd9196d1ab 100644 (file)
 #include <asm/barrier.h>
 #include <asm/cmpxchg.h>
 
-static inline int arch_atomic_read(const atomic_t *v)
+static __always_inline int arch_atomic_read(const atomic_t *v)
 {
        return __atomic_read(v);
 }
 #define arch_atomic_read arch_atomic_read
 
-static inline void arch_atomic_set(atomic_t *v, int i)
+static __always_inline void arch_atomic_set(atomic_t *v, int i)
 {
        __atomic_set(v, i);
 }
 #define arch_atomic_set arch_atomic_set
 
-static inline int arch_atomic_add_return(int i, atomic_t *v)
+static __always_inline int arch_atomic_add_return(int i, atomic_t *v)
 {
        return __atomic_add_barrier(i, &v->counter) + i;
 }
 #define arch_atomic_add_return arch_atomic_add_return
 
-static inline int arch_atomic_fetch_add(int i, atomic_t *v)
+static __always_inline int arch_atomic_fetch_add(int i, atomic_t *v)
 {
        return __atomic_add_barrier(i, &v->counter);
 }
 #define arch_atomic_fetch_add arch_atomic_fetch_add
 
-static inline void arch_atomic_add(int i, atomic_t *v)
+static __always_inline void arch_atomic_add(int i, atomic_t *v)
 {
        __atomic_add(i, &v->counter);
 }
@@ -50,11 +50,11 @@ static inline void arch_atomic_add(int i, atomic_t *v)
 #define arch_atomic_fetch_sub(_i, _v)  arch_atomic_fetch_add(-(int)(_i), _v)
 
 #define ATOMIC_OPS(op)                                                 \
-static inline void arch_atomic_##op(int i, atomic_t *v)                        \
+static __always_inline void arch_atomic_##op(int i, atomic_t *v)       \
 {                                                                      \
        __atomic_##op(i, &v->counter);                                  \
 }                                                                      \
-static inline int arch_atomic_fetch_##op(int i, atomic_t *v)           \
+static __always_inline int arch_atomic_fetch_##op(int i, atomic_t *v)  \
 {                                                                      \
        return __atomic_##op##_barrier(i, &v->counter);                 \
 }
@@ -74,7 +74,7 @@ ATOMIC_OPS(xor)
 
 #define arch_atomic_xchg(v, new)       (arch_xchg(&((v)->counter), new))
 
-static inline int arch_atomic_cmpxchg(atomic_t *v, int old, int new)
+static __always_inline int arch_atomic_cmpxchg(atomic_t *v, int old, int new)
 {
        return __atomic_cmpxchg(&v->counter, old, new);
 }
@@ -82,31 +82,31 @@ static inline int arch_atomic_cmpxchg(atomic_t *v, int old, int new)
 
 #define ATOMIC64_INIT(i)  { (i) }
 
-static inline s64 arch_atomic64_read(const atomic64_t *v)
+static __always_inline s64 arch_atomic64_read(const atomic64_t *v)
 {
        return __atomic64_read(v);
 }
 #define arch_atomic64_read arch_atomic64_read
 
-static inline void arch_atomic64_set(atomic64_t *v, s64 i)
+static __always_inline void arch_atomic64_set(atomic64_t *v, s64 i)
 {
        __atomic64_set(v, i);
 }
 #define arch_atomic64_set arch_atomic64_set
 
-static inline s64 arch_atomic64_add_return(s64 i, atomic64_t *v)
+static __always_inline s64 arch_atomic64_add_return(s64 i, atomic64_t *v)
 {
        return __atomic64_add_barrier(i, (long *)&v->counter) + i;
 }
 #define arch_atomic64_add_return arch_atomic64_add_return
 
-static inline s64 arch_atomic64_fetch_add(s64 i, atomic64_t *v)
+static __always_inline s64 arch_atomic64_fetch_add(s64 i, atomic64_t *v)
 {
        return __atomic64_add_barrier(i, (long *)&v->counter);
 }
 #define arch_atomic64_fetch_add arch_atomic64_fetch_add
 
-static inline void arch_atomic64_add(s64 i, atomic64_t *v)
+static __always_inline void arch_atomic64_add(s64 i, atomic64_t *v)
 {
        __atomic64_add(i, (long *)&v->counter);
 }
@@ -114,20 +114,20 @@ static inline void arch_atomic64_add(s64 i, atomic64_t *v)
 
 #define arch_atomic64_xchg(v, new)     (arch_xchg(&((v)->counter), new))
 
-static inline s64 arch_atomic64_cmpxchg(atomic64_t *v, s64 old, s64 new)
+static __always_inline s64 arch_atomic64_cmpxchg(atomic64_t *v, s64 old, s64 new)
 {
        return __atomic64_cmpxchg((long *)&v->counter, old, new);
 }
 #define arch_atomic64_cmpxchg arch_atomic64_cmpxchg
 
-#define ATOMIC64_OPS(op)                                               \
-static inline void arch_atomic64_##op(s64 i, atomic64_t *v)            \
-{                                                                      \
-       __atomic64_##op(i, (long *)&v->counter);                        \
-}                                                                      \
-static inline long arch_atomic64_fetch_##op(s64 i, atomic64_t *v)      \
-{                                                                      \
-       return __atomic64_##op##_barrier(i, (long *)&v->counter);       \
+#define ATOMIC64_OPS(op)                                                       \
+static __always_inline void arch_atomic64_##op(s64 i, atomic64_t *v)           \
+{                                                                              \
+       __atomic64_##op(i, (long *)&v->counter);                                \
+}                                                                              \
+static __always_inline long arch_atomic64_fetch_##op(s64 i, atomic64_t *v)     \
+{                                                                              \
+       return __atomic64_##op##_barrier(i, (long *)&v->counter);               \
 }
 
 ATOMIC64_OPS(and)
index 50510e08b893b557dc0f7c8211093a7bf5bb91d5..7fa5f96a553a4720c5f5d41119d10c7cc954ce9e 100644 (file)
@@ -8,7 +8,7 @@
 #ifndef __ARCH_S390_ATOMIC_OPS__
 #define __ARCH_S390_ATOMIC_OPS__
 
-static inline int __atomic_read(const atomic_t *v)
+static __always_inline int __atomic_read(const atomic_t *v)
 {
        int c;
 
@@ -18,14 +18,14 @@ static inline int __atomic_read(const atomic_t *v)
        return c;
 }
 
-static inline void __atomic_set(atomic_t *v, int i)
+static __always_inline void __atomic_set(atomic_t *v, int i)
 {
        asm volatile(
                "       st      %1,%0\n"
                : "=R" (v->counter) : "d" (i));
 }
 
-static inline s64 __atomic64_read(const atomic64_t *v)
+static __always_inline s64 __atomic64_read(const atomic64_t *v)
 {
        s64 c;
 
@@ -35,7 +35,7 @@ static inline s64 __atomic64_read(const atomic64_t *v)
        return c;
 }
 
-static inline void __atomic64_set(atomic64_t *v, s64 i)
+static __always_inline void __atomic64_set(atomic64_t *v, s64 i)
 {
        asm volatile(
                "       stg     %1,%0\n"
@@ -45,7 +45,7 @@ static inline void __atomic64_set(atomic64_t *v, s64 i)
 #ifdef CONFIG_HAVE_MARCH_Z196_FEATURES
 
 #define __ATOMIC_OP(op_name, op_type, op_string, op_barrier)           \
-static inline op_type op_name(op_type val, op_type *ptr)               \
+static __always_inline op_type op_name(op_type val, op_type *ptr)      \
 {                                                                      \
        op_type old;                                                    \
                                                                        \
@@ -96,7 +96,7 @@ __ATOMIC_CONST_OPS(__atomic64_add_const, long, "agsi")
 #else /* CONFIG_HAVE_MARCH_Z196_FEATURES */
 
 #define __ATOMIC_OP(op_name, op_string)                                        \
-static inline int op_name(int val, int *ptr)                           \
+static __always_inline int op_name(int val, int *ptr)                  \
 {                                                                      \
        int old, new;                                                   \
                                                                        \
@@ -122,7 +122,7 @@ __ATOMIC_OPS(__atomic_xor, "xr")
 #undef __ATOMIC_OPS
 
 #define __ATOMIC64_OP(op_name, op_string)                              \
-static inline long op_name(long val, long *ptr)                                \
+static __always_inline long op_name(long val, long *ptr)               \
 {                                                                      \
        long old, new;                                                  \
                                                                        \
@@ -154,7 +154,7 @@ __ATOMIC64_OPS(__atomic64_xor, "xgr")
 
 #endif /* CONFIG_HAVE_MARCH_Z196_FEATURES */
 
-static inline int __atomic_cmpxchg(int *ptr, int old, int new)
+static __always_inline int __atomic_cmpxchg(int *ptr, int old, int new)
 {
        asm volatile(
                "       cs      %[old],%[new],%[ptr]"
@@ -164,7 +164,7 @@ static inline int __atomic_cmpxchg(int *ptr, int old, int new)
        return old;
 }
 
-static inline bool __atomic_cmpxchg_bool(int *ptr, int old, int new)
+static __always_inline bool __atomic_cmpxchg_bool(int *ptr, int old, int new)
 {
        int old_expected = old;
 
@@ -176,7 +176,7 @@ static inline bool __atomic_cmpxchg_bool(int *ptr, int old, int new)
        return old == old_expected;
 }
 
-static inline long __atomic64_cmpxchg(long *ptr, long old, long new)
+static __always_inline long __atomic64_cmpxchg(long *ptr, long old, long new)
 {
        asm volatile(
                "       csg     %[old],%[new],%[ptr]"
@@ -186,7 +186,7 @@ static inline long __atomic64_cmpxchg(long *ptr, long old, long new)
        return old;
 }
 
-static inline bool __atomic64_cmpxchg_bool(long *ptr, long old, long new)
+static __always_inline bool __atomic64_cmpxchg_bool(long *ptr, long old, long new)
 {
        long old_expected = old;
 
index bf15da0fedbca5ed6cc0d24a6af9b348b2776fcd..0e3da500e98c19109676f690385bb6da44bf971c 100644 (file)
 #define PREEMPT_NEED_RESCHED   0x80000000
 #define PREEMPT_ENABLED        (0 + PREEMPT_NEED_RESCHED)
 
-static inline int preempt_count(void)
+static __always_inline int preempt_count(void)
 {
        return READ_ONCE(S390_lowcore.preempt_count) & ~PREEMPT_NEED_RESCHED;
 }
 
-static inline void preempt_count_set(int pc)
+static __always_inline void preempt_count_set(int pc)
 {
        int old, new;
 
@@ -29,22 +29,22 @@ static inline void preempt_count_set(int pc)
                                  old, new) != old);
 }
 
-static inline void set_preempt_need_resched(void)
+static __always_inline void set_preempt_need_resched(void)
 {
        __atomic_and(~PREEMPT_NEED_RESCHED, &S390_lowcore.preempt_count);
 }
 
-static inline void clear_preempt_need_resched(void)
+static __always_inline void clear_preempt_need_resched(void)
 {
        __atomic_or(PREEMPT_NEED_RESCHED, &S390_lowcore.preempt_count);
 }
 
-static inline bool test_preempt_need_resched(void)
+static __always_inline bool test_preempt_need_resched(void)
 {
        return !(READ_ONCE(S390_lowcore.preempt_count) & PREEMPT_NEED_RESCHED);
 }
 
-static inline void __preempt_count_add(int val)
+static __always_inline void __preempt_count_add(int val)
 {
        /*
         * With some obscure config options and CONFIG_PROFILE_ALL_BRANCHES
@@ -59,17 +59,17 @@ static inline void __preempt_count_add(int val)
        __atomic_add(val, &S390_lowcore.preempt_count);
 }
 
-static inline void __preempt_count_sub(int val)
+static __always_inline void __preempt_count_sub(int val)
 {
        __preempt_count_add(-val);
 }
 
-static inline bool __preempt_count_dec_and_test(void)
+static __always_inline bool __preempt_count_dec_and_test(void)
 {
        return __atomic_add(-1, &S390_lowcore.preempt_count) == 1;
 }
 
-static inline bool should_resched(int preempt_offset)
+static __always_inline bool should_resched(int preempt_offset)
 {
        return unlikely(READ_ONCE(S390_lowcore.preempt_count) ==
                        preempt_offset);
@@ -79,45 +79,45 @@ static inline bool should_resched(int preempt_offset)
 
 #define PREEMPT_ENABLED        (0)
 
-static inline int preempt_count(void)
+static __always_inline int preempt_count(void)
 {
        return READ_ONCE(S390_lowcore.preempt_count);
 }
 
-static inline void preempt_count_set(int pc)
+static __always_inline void preempt_count_set(int pc)
 {
        S390_lowcore.preempt_count = pc;
 }
 
-static inline void set_preempt_need_resched(void)
+static __always_inline void set_preempt_need_resched(void)
 {
 }
 
-static inline void clear_preempt_need_resched(void)
+static __always_inline void clear_preempt_need_resched(void)
 {
 }
 
-static inline bool test_preempt_need_resched(void)
+static __always_inline bool test_preempt_need_resched(void)
 {
        return false;
 }
 
-static inline void __preempt_count_add(int val)
+static __always_inline void __preempt_count_add(int val)
 {
        S390_lowcore.preempt_count += val;
 }
 
-static inline void __preempt_count_sub(int val)
+static __always_inline void __preempt_count_sub(int val)
 {
        S390_lowcore.preempt_count -= val;
 }
 
-static inline bool __preempt_count_dec_and_test(void)
+static __always_inline bool __preempt_count_dec_and_test(void)
 {
        return !--S390_lowcore.preempt_count && tif_need_resched();
 }
 
-static inline bool should_resched(int preempt_offset)
+static __always_inline bool should_resched(int preempt_offset)
 {
        return unlikely(preempt_count() == preempt_offset &&
                        tif_need_resched());
index 787394978bc0f86400ce30214c2e1b8eb3e82675..3dc85638bc63b7d96eb3ebc25c558cff967395c6 100644 (file)
@@ -635,6 +635,7 @@ SYM_DATA_START_LOCAL(daton_psw)
 SYM_DATA_END(daton_psw)
 
        .section .rodata, "a"
+       .balign 8
 #define SYSCALL(esame,emu)     .quad __s390x_ ## esame
 SYM_DATA_START(sys_call_table)
 #include "asm/syscall_table.h"
index 823d652e3917f8653fe71bb5c67a72c2653cf6c3..4ad472d130a3c075cda96949a605e080ef8d3e1a 100644 (file)
@@ -90,7 +90,6 @@ static void paicrypt_event_destroy(struct perf_event *event)
                                                 event->cpu);
        struct paicrypt_map *cpump = mp->mapptr;
 
-       cpump->event = NULL;
        static_branch_dec(&pai_key);
        mutex_lock(&pai_reserve_mutex);
        debug_sprintf_event(cfm_dbg, 5, "%s event %#llx cpu %d users %d"
@@ -356,10 +355,15 @@ static int paicrypt_add(struct perf_event *event, int flags)
 
 static void paicrypt_stop(struct perf_event *event, int flags)
 {
-       if (!event->attr.sample_period) /* Counting */
+       struct paicrypt_mapptr *mp = this_cpu_ptr(paicrypt_root.mapptr);
+       struct paicrypt_map *cpump = mp->mapptr;
+
+       if (!event->attr.sample_period) {       /* Counting */
                paicrypt_read(event);
-       else                            /* Sampling */
+       } else {                                /* Sampling */
                perf_sched_cb_dec(event->pmu);
+               cpump->event = NULL;
+       }
        event->hw.state = PERF_HES_STOPPED;
 }
 
index 616a25606cd63dcda97a0b781c88b55dc86f0032..a6da7e0cc7a66dac02e9524feb802b0bfee8e0e8 100644 (file)
@@ -122,7 +122,6 @@ static void paiext_event_destroy(struct perf_event *event)
 
        free_page(PAI_SAVE_AREA(event));
        mutex_lock(&paiext_reserve_mutex);
-       cpump->event = NULL;
        if (refcount_dec_and_test(&cpump->refcnt))      /* Last reference gone */
                paiext_free(mp);
        paiext_root_free();
@@ -362,10 +361,15 @@ static int paiext_add(struct perf_event *event, int flags)
 
 static void paiext_stop(struct perf_event *event, int flags)
 {
-       if (!event->attr.sample_period) /* Counting */
+       struct paiext_mapptr *mp = this_cpu_ptr(paiext_root.mapptr);
+       struct paiext_map *cpump = mp->mapptr;
+
+       if (!event->attr.sample_period) {       /* Counting */
                paiext_read(event);
-       else                            /* Sampling */
+       } else {                                /* Sampling */
                perf_sched_cb_dec(event->pmu);
+               cpump->event = NULL;
+       }
        event->hw.state = PERF_HES_STOPPED;
 }
 
index c421dd44ffbe0346ab31028d433e5b3b7a2df626..0c66b32e0f9f1b54b4959a51d8cfc03984b4d52d 100644 (file)
@@ -75,7 +75,7 @@ static enum fault_type get_fault_type(struct pt_regs *regs)
                if (!IS_ENABLED(CONFIG_PGSTE))
                        return KERNEL_FAULT;
                gmap = (struct gmap *)S390_lowcore.gmap;
-               if (regs->cr1 == gmap->asce)
+               if (gmap && gmap->asce == regs->cr1)
                        return GMAP_FAULT;
                return KERNEL_FAULT;
        }
index 4fff6ed46e902cfbe723cf5ed5ce517e2d131891..10a6251f58f3e0789cf5507322b92d92a87bc2eb 100644 (file)
@@ -2633,6 +2633,32 @@ config MITIGATION_RFDS
          stored in floating point, vector and integer registers.
          See also <file:Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst>
 
+choice
+       prompt "Clear branch history"
+       depends on CPU_SUP_INTEL
+       default SPECTRE_BHI_ON
+       help
+         Enable BHI mitigations. BHI attacks are a form of Spectre V2 attacks
+         where the branch history buffer is poisoned to speculatively steer
+         indirect branches.
+         See <file:Documentation/admin-guide/hw-vuln/spectre.rst>
+
+config SPECTRE_BHI_ON
+       bool "on"
+       help
+         Equivalent to setting spectre_bhi=on command line parameter.
+config SPECTRE_BHI_OFF
+       bool "off"
+       help
+         Equivalent to setting spectre_bhi=off command line parameter.
+config SPECTRE_BHI_AUTO
+       bool "auto"
+       depends on BROKEN
+       help
+         Equivalent to setting spectre_bhi=auto command line parameter.
+
+endchoice
+
 endif
 
 config ARCH_HAS_ADD_PAGES
index d07be9d05cd03781072798e5802a5ef34d7a057a..b31ef2424d194b96d07b601d4eeac4b23d637d27 100644 (file)
@@ -3,19 +3,28 @@
  * Confidential Computing Platform Capability checks
  *
  * Copyright (C) 2021 Advanced Micro Devices, Inc.
+ * Copyright (C) 2024 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
  *
  * Author: Tom Lendacky <thomas.lendacky@amd.com>
  */
 
 #include <linux/export.h>
 #include <linux/cc_platform.h>
+#include <linux/string.h>
+#include <linux/random.h>
 
+#include <asm/archrandom.h>
 #include <asm/coco.h>
 #include <asm/processor.h>
 
 enum cc_vendor cc_vendor __ro_after_init = CC_VENDOR_NONE;
 u64 cc_mask __ro_after_init;
 
+static struct cc_attr_flags {
+       __u64 host_sev_snp      : 1,
+             __resv            : 63;
+} cc_flags;
+
 static bool noinstr intel_cc_platform_has(enum cc_attr attr)
 {
        switch (attr) {
@@ -89,6 +98,9 @@ static bool noinstr amd_cc_platform_has(enum cc_attr attr)
        case CC_ATTR_GUEST_SEV_SNP:
                return sev_status & MSR_AMD64_SEV_SNP_ENABLED;
 
+       case CC_ATTR_HOST_SEV_SNP:
+               return cc_flags.host_sev_snp;
+
        default:
                return false;
        }
@@ -148,3 +160,84 @@ u64 cc_mkdec(u64 val)
        }
 }
 EXPORT_SYMBOL_GPL(cc_mkdec);
+
+static void amd_cc_platform_clear(enum cc_attr attr)
+{
+       switch (attr) {
+       case CC_ATTR_HOST_SEV_SNP:
+               cc_flags.host_sev_snp = 0;
+               break;
+       default:
+               break;
+       }
+}
+
+void cc_platform_clear(enum cc_attr attr)
+{
+       switch (cc_vendor) {
+       case CC_VENDOR_AMD:
+               amd_cc_platform_clear(attr);
+               break;
+       default:
+               break;
+       }
+}
+
+static void amd_cc_platform_set(enum cc_attr attr)
+{
+       switch (attr) {
+       case CC_ATTR_HOST_SEV_SNP:
+               cc_flags.host_sev_snp = 1;
+               break;
+       default:
+               break;
+       }
+}
+
+void cc_platform_set(enum cc_attr attr)
+{
+       switch (cc_vendor) {
+       case CC_VENDOR_AMD:
+               amd_cc_platform_set(attr);
+               break;
+       default:
+               break;
+       }
+}
+
+__init void cc_random_init(void)
+{
+       /*
+        * The seed is 32 bytes (in units of longs), which is 256 bits, which
+        * is the security level that the RNG is targeting.
+        */
+       unsigned long rng_seed[32 / sizeof(long)];
+       size_t i, longs;
+
+       if (!cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))
+               return;
+
+       /*
+        * Since the CoCo threat model includes the host, the only reliable
+        * source of entropy that can be neither observed nor manipulated is
+        * RDRAND. Usually, RDRAND failure is considered tolerable, but since
+        * CoCo guests have no other unobservable source of entropy, it's
+        * important to at least ensure the RNG gets some initial random seeds.
+        */
+       for (i = 0; i < ARRAY_SIZE(rng_seed); i += longs) {
+               longs = arch_get_random_longs(&rng_seed[i], ARRAY_SIZE(rng_seed) - i);
+
+               /*
+                * A zero return value means that the guest doesn't have RDRAND
+                * or the CPU is physically broken, and in both cases that
+                * means most crypto inside of the CoCo instance will be
+                * broken, defeating the purpose of CoCo in the first place. So
+                * just panic here because it's absolutely unsafe to continue
+                * executing.
+                */
+               if (longs == 0)
+                       panic("RDRAND is defective.");
+       }
+       add_device_randomness(rng_seed, sizeof(rng_seed));
+       memzero_explicit(rng_seed, sizeof(rng_seed));
+}
index 6356060caaf311af8370ccaeb69aab85847b62d1..6de50b80702e61087c432faf580a0ab063d22507 100644 (file)
@@ -49,7 +49,7 @@ static __always_inline bool do_syscall_x64(struct pt_regs *regs, int nr)
 
        if (likely(unr < NR_syscalls)) {
                unr = array_index_nospec(unr, NR_syscalls);
-               regs->ax = sys_call_table[unr](regs);
+               regs->ax = x64_sys_call(regs, unr);
                return true;
        }
        return false;
@@ -66,7 +66,7 @@ static __always_inline bool do_syscall_x32(struct pt_regs *regs, int nr)
 
        if (IS_ENABLED(CONFIG_X86_X32_ABI) && likely(xnr < X32_NR_syscalls)) {
                xnr = array_index_nospec(xnr, X32_NR_syscalls);
-               regs->ax = x32_sys_call_table[xnr](regs);
+               regs->ax = x32_sys_call(regs, xnr);
                return true;
        }
        return false;
@@ -162,7 +162,7 @@ static __always_inline void do_syscall_32_irqs_on(struct pt_regs *regs, int nr)
 
        if (likely(unr < IA32_NR_syscalls)) {
                unr = array_index_nospec(unr, IA32_NR_syscalls);
-               regs->ax = ia32_sys_call_table[unr](regs);
+               regs->ax = ia32_sys_call(regs, unr);
        } else if (nr != -1) {
                regs->ax = __ia32_sys_ni_syscall(regs);
        }
@@ -189,7 +189,7 @@ static __always_inline bool int80_is_external(void)
 }
 
 /**
- * int80_emulation - 32-bit legacy syscall entry
+ * do_int80_emulation - 32-bit legacy syscall C entry from asm
  *
  * This entry point can be used by 32-bit and 64-bit programs to perform
  * 32-bit system calls.  Instances of INT $0x80 can be found inline in
@@ -207,7 +207,7 @@ static __always_inline bool int80_is_external(void)
  *   eax:                              system call number
  *   ebx, ecx, edx, esi, edi, ebp:     arg1 - arg 6
  */
-DEFINE_IDTENTRY_RAW(int80_emulation)
+__visible noinstr void do_int80_emulation(struct pt_regs *regs)
 {
        int nr;
 
index 8af2a26b24f6a9783f9bb348cd67c15e1c3799c8..1b5be07f86698a3b634a0d83b6578775781d1739 100644 (file)
@@ -116,6 +116,7 @@ SYM_INNER_LABEL(entry_SYSCALL_64_after_hwframe, SYM_L_GLOBAL)
        /* clobbers %rax, make sure it is after saving the syscall nr */
        IBRS_ENTER
        UNTRAIN_RET
+       CLEAR_BRANCH_HISTORY
 
        call    do_syscall_64           /* returns with IRQs disabled */
 
@@ -1491,3 +1492,63 @@ SYM_CODE_START_NOALIGN(rewind_stack_and_make_dead)
        call    make_task_dead
 SYM_CODE_END(rewind_stack_and_make_dead)
 .popsection
+
+/*
+ * This sequence executes branches in order to remove user branch information
+ * from the branch history tracker in the Branch Predictor, therefore removing
+ * user influence on subsequent BTB lookups.
+ *
+ * It should be used on parts prior to Alder Lake. Newer parts should use the
+ * BHI_DIS_S hardware control instead. If a pre-Alder Lake part is being
+ * virtualized on newer hardware the VMM should protect against BHI attacks by
+ * setting BHI_DIS_S for the guests.
+ *
+ * CALLs/RETs are necessary to prevent Loop Stream Detector(LSD) from engaging
+ * and not clearing the branch history. The call tree looks like:
+ *
+ * call 1
+ *    call 2
+ *      call 2
+ *        call 2
+ *          call 2
+ *           call 2
+ *           ret
+ *         ret
+ *        ret
+ *      ret
+ *    ret
+ * ret
+ *
+ * This means that the stack is non-constant and ORC can't unwind it with %rsp
+ * alone.  Therefore we unconditionally set up the frame pointer, which allows
+ * ORC to unwind properly.
+ *
+ * The alignment is for performance and not for safety, and may be safely
+ * refactored in the future if needed.
+ */
+SYM_FUNC_START(clear_bhb_loop)
+       push    %rbp
+       mov     %rsp, %rbp
+       movl    $5, %ecx
+       ANNOTATE_INTRA_FUNCTION_CALL
+       call    1f
+       jmp     5f
+       .align 64, 0xcc
+       ANNOTATE_INTRA_FUNCTION_CALL
+1:     call    2f
+       RET
+       .align 64, 0xcc
+2:     movl    $5, %eax
+3:     jmp     4f
+       nop
+4:     sub     $1, %eax
+       jnz     3b
+       sub     $1, %ecx
+       jnz     1b
+       RET
+5:     lfence
+       pop     %rbp
+       RET
+SYM_FUNC_END(clear_bhb_loop)
+EXPORT_SYMBOL_GPL(clear_bhb_loop)
+STACK_FRAME_NON_STANDARD(clear_bhb_loop)
index eabf48c4d4b4c30367792f5d9a0b158a9ecf8a04..c779046cc3fe792658a984648328000535812dea 100644 (file)
@@ -92,6 +92,7 @@ SYM_INNER_LABEL(entry_SYSENTER_compat_after_hwframe, SYM_L_GLOBAL)
 
        IBRS_ENTER
        UNTRAIN_RET
+       CLEAR_BRANCH_HISTORY
 
        /*
         * SYSENTER doesn't filter flags, so we need to clear NT and AC
@@ -206,6 +207,7 @@ SYM_INNER_LABEL(entry_SYSCALL_compat_after_hwframe, SYM_L_GLOBAL)
 
        IBRS_ENTER
        UNTRAIN_RET
+       CLEAR_BRANCH_HISTORY
 
        movq    %rsp, %rdi
        call    do_fast_syscall_32
@@ -276,3 +278,17 @@ SYM_INNER_LABEL(entry_SYSRETL_compat_end, SYM_L_GLOBAL)
        ANNOTATE_NOENDBR
        int3
 SYM_CODE_END(entry_SYSCALL_compat)
+
+/*
+ * int 0x80 is used by 32 bit mode as a system call entry. Normally idt entries
+ * point to C routines, however since this is a system call interface the branch
+ * history needs to be scrubbed to protect against BHI attacks, and that
+ * scrubbing needs to take place in assembly code prior to entering any C
+ * routines.
+ */
+SYM_CODE_START(int80_emulation)
+       ANNOTATE_NOENDBR
+       UNWIND_HINT_FUNC
+       CLEAR_BRANCH_HISTORY
+       jmp do_int80_emulation
+SYM_CODE_END(int80_emulation)
index 8cfc9bc73e7f8b21f748367256a78df3dc5e5b4a..c2235bae17ef665098342c323a24e4b388c169cb 100644 (file)
 #include <asm/syscalls_32.h>
 #undef __SYSCALL
 
+/*
+ * The sys_call_table[] is no longer used for system calls, but
+ * kernel/trace/trace_syscalls.c still wants to know the system
+ * call address.
+ */
+#ifdef CONFIG_X86_32
 #define __SYSCALL(nr, sym) __ia32_##sym,
-
-__visible const sys_call_ptr_t ia32_sys_call_table[] = {
+const sys_call_ptr_t sys_call_table[] = {
 #include <asm/syscalls_32.h>
 };
+#undef __SYSCALL
+#endif
+
+#define __SYSCALL(nr, sym) case nr: return __ia32_##sym(regs);
+
+long ia32_sys_call(const struct pt_regs *regs, unsigned int nr)
+{
+       switch (nr) {
+       #include <asm/syscalls_32.h>
+       default: return __ia32_sys_ni_syscall(regs);
+       }
+};
index be120eec1fc9f95c69c23074bcd3fbc355b90d47..33b3f09e6f151e11faca1c9d13f0eb4917f3392b 100644 (file)
 #include <asm/syscalls_64.h>
 #undef __SYSCALL
 
+/*
+ * The sys_call_table[] is no longer used for system calls, but
+ * kernel/trace/trace_syscalls.c still wants to know the system
+ * call address.
+ */
 #define __SYSCALL(nr, sym) __x64_##sym,
-
-asmlinkage const sys_call_ptr_t sys_call_table[] = {
+const sys_call_ptr_t sys_call_table[] = {
 #include <asm/syscalls_64.h>
 };
+#undef __SYSCALL
+
+#define __SYSCALL(nr, sym) case nr: return __x64_##sym(regs);
+
+long x64_sys_call(const struct pt_regs *regs, unsigned int nr)
+{
+       switch (nr) {
+       #include <asm/syscalls_64.h>
+       default: return __x64_sys_ni_syscall(regs);
+       }
+};
index bdd0e03a1265d23e474c5c45e1bd64e7b14b7b79..03de4a93213182c6fa5809b077a54ea51be411ea 100644 (file)
 #include <asm/syscalls_x32.h>
 #undef __SYSCALL
 
-#define __SYSCALL(nr, sym) __x64_##sym,
+#define __SYSCALL(nr, sym) case nr: return __x64_##sym(regs);
 
-asmlinkage const sys_call_ptr_t x32_sys_call_table[] = {
-#include <asm/syscalls_x32.h>
+long x32_sys_call(const struct pt_regs *regs, unsigned int nr)
+{
+       switch (nr) {
+       #include <asm/syscalls_x32.h>
+       default: return __x64_sys_ni_syscall(regs);
+       }
 };
index 2641ba620f12a51d4c5d71ceba3bd28557926bfb..e010bfed84170570ddc74d307fd0620c7bc6a3c9 100644 (file)
@@ -1237,11 +1237,11 @@ pebs_update_state(bool needed_cb, struct cpu_hw_events *cpuc,
        struct pmu *pmu = event->pmu;
 
        /*
-        * Make sure we get updated with the first PEBS
-        * event. It will trigger also during removal, but
-        * that does not hurt:
+        * Make sure we get updated with the first PEBS event.
+        * During removal, ->pebs_data_cfg is still valid for
+        * the last PEBS event. Don't clear it.
         */
-       if (cpuc->n_pebs == 1)
+       if ((cpuc->n_pebs == 1) && add)
                cpuc->pebs_data_cfg = PEBS_UPDATE_DS_SW;
 
        if (needed_cb != pebs_needs_sched_cb(cpuc)) {
index fb7388bbc212f9b1435b47206ae586cf84505846..c086699b0d0c59fc62834bb457963f1b81b541d3 100644 (file)
@@ -22,6 +22,7 @@ static inline void cc_set_mask(u64 mask)
 
 u64 cc_mkenc(u64 val);
 u64 cc_mkdec(u64 val);
+void cc_random_init(void);
 #else
 #define cc_vendor (CC_VENDOR_NONE)
 
@@ -34,6 +35,7 @@ static inline u64 cc_mkdec(u64 val)
 {
        return val;
 }
+static inline void cc_random_init(void) { }
 #endif
 
 #endif /* _ASM_X86_COCO_H */
index 42157ddcc09d436fef9684a813b3e9b9e666e9da..686e92d2663eeeacd90a46568ae37b3db76b9e00 100644 (file)
@@ -33,6 +33,8 @@ enum cpuid_leafs
        CPUID_7_EDX,
        CPUID_8000_001F_EAX,
        CPUID_8000_0021_EAX,
+       CPUID_LNX_5,
+       NR_CPUID_WORDS,
 };
 
 #define X86_CAP_FMT_NUM "%d:%d"
index a38f8f9ba65729125234814c08547498e4e3b8bc..3c7434329661c66e7c34283f0a3f2c59a87f8044 100644 (file)
 
 /*
  * Extended auxiliary flags: Linux defined - for features scattered in various
- * CPUID levels like 0x80000022, etc.
+ * CPUID levels like 0x80000022, etc and Linux defined features.
  *
  * Reuse free bits when adding new feature flags!
  */
 #define X86_FEATURE_AMD_LBR_PMC_FREEZE (21*32+ 0) /* AMD LBR and PMC Freeze */
+#define X86_FEATURE_CLEAR_BHB_LOOP     (21*32+ 1) /* "" Clear branch history at syscall entry using SW loop */
+#define X86_FEATURE_BHI_CTRL           (21*32+ 2) /* "" BHI_DIS_S HW control available */
+#define X86_FEATURE_CLEAR_BHB_HW       (21*32+ 3) /* "" BHI_DIS_S HW control enabled */
+#define X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT (21*32+ 4) /* "" Clear branch history at vmexit using SW loop */
 
 /*
  * BUG word(s)
 #define X86_BUG_SRSO                   X86_BUG(1*32 + 0) /* AMD SRSO bug */
 #define X86_BUG_DIV0                   X86_BUG(1*32 + 1) /* AMD DIV0 speculation bug */
 #define X86_BUG_RFDS                   X86_BUG(1*32 + 2) /* CPU is vulnerable to Register File Data Sampling */
+#define X86_BUG_BHI                    X86_BUG(1*32 + 3) /* CPU is affected by Branch History Injection */
 #endif /* _ASM_X86_CPUFEATURES_H */
index 05956bd8bacf50e35f463c13720a38735fe8b1b5..e72c2b87295799af9d44eb84f59d095f4f90acfd 100644 (file)
 #define SPEC_CTRL_SSBD                 BIT(SPEC_CTRL_SSBD_SHIFT)       /* Speculative Store Bypass Disable */
 #define SPEC_CTRL_RRSBA_DIS_S_SHIFT    6          /* Disable RRSBA behavior */
 #define SPEC_CTRL_RRSBA_DIS_S          BIT(SPEC_CTRL_RRSBA_DIS_S_SHIFT)
+#define SPEC_CTRL_BHI_DIS_S_SHIFT      10         /* Disable Branch History Injection behavior */
+#define SPEC_CTRL_BHI_DIS_S            BIT(SPEC_CTRL_BHI_DIS_S_SHIFT)
 
 /* A mask for bits which the kernel toggles when controlling mitigations */
 #define SPEC_CTRL_MITIGATIONS_MASK     (SPEC_CTRL_IBRS | SPEC_CTRL_STIBP | SPEC_CTRL_SSBD \
-                                                       | SPEC_CTRL_RRSBA_DIS_S)
+                                                       | SPEC_CTRL_RRSBA_DIS_S \
+                                                       | SPEC_CTRL_BHI_DIS_S)
 
 #define MSR_IA32_PRED_CMD              0x00000049 /* Prediction Command */
 #define PRED_CMD_IBPB                  BIT(0)     /* Indirect Branch Prediction Barrier */
                                                 * are restricted to targets in
                                                 * kernel.
                                                 */
+#define ARCH_CAP_BHI_NO                        BIT(20) /*
+                                                * CPU is not affected by Branch
+                                                * History Injection.
+                                                */
 #define ARCH_CAP_PBRSB_NO              BIT(24) /*
                                                 * Not susceptible to Post-Barrier
                                                 * Return Stack Buffer Predictions.
index 170c89ed22fcd3a27106d166a9e7f5a5d1fadf80..ff5f1ecc7d1e6512fcc34f4a6e5df5976e9087f0 100644 (file)
        ALTERNATIVE "", __stringify(verw _ASM_RIP(mds_verw_sel)), X86_FEATURE_CLEAR_CPU_BUF
 .endm
 
+#ifdef CONFIG_X86_64
+.macro CLEAR_BRANCH_HISTORY
+       ALTERNATIVE "", "call clear_bhb_loop", X86_FEATURE_CLEAR_BHB_LOOP
+.endm
+
+.macro CLEAR_BRANCH_HISTORY_VMEXIT
+       ALTERNATIVE "", "call clear_bhb_loop", X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT
+.endm
+#else
+#define CLEAR_BRANCH_HISTORY
+#define CLEAR_BRANCH_HISTORY_VMEXIT
+#endif
+
 #else /* __ASSEMBLY__ */
 
 #define ANNOTATE_RETPOLINE_SAFE                                        \
@@ -368,6 +381,10 @@ extern void srso_alias_return_thunk(void);
 extern void entry_untrain_ret(void);
 extern void entry_ibpb(void);
 
+#ifdef CONFIG_X86_64
+extern void clear_bhb_loop(void);
+#endif
+
 extern void (*x86_return_thunk)(void);
 
 extern void __warn_thunk(void);
index 07e125f32528394df24c06de44e3232eaf15e4cd..7f57382afee41754beb8164244f199e4ac30a148 100644 (file)
@@ -228,7 +228,6 @@ int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input, struct sn
 void snp_accept_memory(phys_addr_t start, phys_addr_t end);
 u64 snp_get_unsupported_features(u64 status);
 u64 sev_get_status(void);
-void kdump_sev_callback(void);
 void sev_show_status(void);
 #else
 static inline void sev_es_ist_enter(struct pt_regs *regs) { }
@@ -258,7 +257,6 @@ static inline int snp_issue_guest_request(u64 exit_code, struct snp_req_data *in
 static inline void snp_accept_memory(phys_addr_t start, phys_addr_t end) { }
 static inline u64 snp_get_unsupported_features(u64 status) { return 0; }
 static inline u64 sev_get_status(void) { return 0; }
-static inline void kdump_sev_callback(void) { }
 static inline void sev_show_status(void) { }
 #endif
 
@@ -270,6 +268,7 @@ int psmash(u64 pfn);
 int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, u32 asid, bool immutable);
 int rmp_make_shared(u64 pfn, enum pg_level level);
 void snp_leak_pages(u64 pfn, unsigned int npages);
+void kdump_sev_callback(void);
 #else
 static inline bool snp_probe_rmptable_info(void) { return false; }
 static inline int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level) { return -ENODEV; }
@@ -282,6 +281,7 @@ static inline int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, u32 as
 }
 static inline int rmp_make_shared(u64 pfn, enum pg_level level) { return -ENODEV; }
 static inline void snp_leak_pages(u64 pfn, unsigned int npages) {}
+static inline void kdump_sev_callback(void) { }
 #endif
 
 #endif
index f44e2f9ab65d779f35bac9c5e58dd8b694778efc..2fc7bc3863ff6f7a932ac2ee05682a2ba71f3308 100644 (file)
 #include <asm/thread_info.h>   /* for TS_COMPAT */
 #include <asm/unistd.h>
 
+/* This is used purely for kernel/trace/trace_syscalls.c */
 typedef long (*sys_call_ptr_t)(const struct pt_regs *);
 extern const sys_call_ptr_t sys_call_table[];
 
-#if defined(CONFIG_X86_32)
-#define ia32_sys_call_table sys_call_table
-#else
 /*
  * These may not exist, but still put the prototypes in so we
  * can use IS_ENABLED().
  */
-extern const sys_call_ptr_t ia32_sys_call_table[];
-extern const sys_call_ptr_t x32_sys_call_table[];
-#endif
+extern long ia32_sys_call(const struct pt_regs *, unsigned int nr);
+extern long x32_sys_call(const struct pt_regs *, unsigned int nr);
+extern long x64_sys_call(const struct pt_regs *, unsigned int nr);
 
 /*
  * Only the low 32 bits of orig_ax are meaningful, so we return int.
@@ -127,6 +125,7 @@ static inline int syscall_get_arch(struct task_struct *task)
 }
 
 bool do_syscall_64(struct pt_regs *regs, int nr);
+void do_int80_emulation(struct pt_regs *regs);
 
 #endif /* CONFIG_X86_32 */
 
index 6d8677e80ddbb17c94ec7fcda9e5bae502c0dcb2..9bf17c9c29dad2e3f3c38c07253accc667cadea3 100644 (file)
@@ -345,6 +345,28 @@ static void srat_detect_node(struct cpuinfo_x86 *c)
 #endif
 }
 
+static void bsp_determine_snp(struct cpuinfo_x86 *c)
+{
+#ifdef CONFIG_ARCH_HAS_CC_PLATFORM
+       cc_vendor = CC_VENDOR_AMD;
+
+       if (cpu_has(c, X86_FEATURE_SEV_SNP)) {
+               /*
+                * RMP table entry format is not architectural and is defined by the
+                * per-processor PPR. Restrict SNP support on the known CPU models
+                * for which the RMP table entry format is currently defined for.
+                */
+               if (!cpu_has(c, X86_FEATURE_HYPERVISOR) &&
+                   c->x86 >= 0x19 && snp_probe_rmptable_info()) {
+                       cc_platform_set(CC_ATTR_HOST_SEV_SNP);
+               } else {
+                       setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
+                       cc_platform_clear(CC_ATTR_HOST_SEV_SNP);
+               }
+       }
+#endif
+}
+
 static void bsp_init_amd(struct cpuinfo_x86 *c)
 {
        if (cpu_has(c, X86_FEATURE_CONSTANT_TSC)) {
@@ -452,21 +474,7 @@ static void bsp_init_amd(struct cpuinfo_x86 *c)
                break;
        }
 
-       if (cpu_has(c, X86_FEATURE_SEV_SNP)) {
-               /*
-                * RMP table entry format is not architectural and it can vary by processor
-                * and is defined by the per-processor PPR. Restrict SNP support on the
-                * known CPU model and family for which the RMP table entry format is
-                * currently defined for.
-                */
-               if (!boot_cpu_has(X86_FEATURE_ZEN3) &&
-                   !boot_cpu_has(X86_FEATURE_ZEN4) &&
-                   !boot_cpu_has(X86_FEATURE_ZEN5))
-                       setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
-               else if (!snp_probe_rmptable_info())
-                       setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
-       }
-
+       bsp_determine_snp(c);
        return;
 
 warn:
index e7ba936d798b8198f5837118d5bb33d40389ccc7..295463707e68181cb536f8f4bd763bf045936202 100644 (file)
@@ -1607,6 +1607,79 @@ static void __init spectre_v2_determine_rsb_fill_type_at_vmexit(enum spectre_v2_
        dump_stack();
 }
 
+/*
+ * Set BHI_DIS_S to prevent indirect branches in kernel to be influenced by
+ * branch history in userspace. Not needed if BHI_NO is set.
+ */
+static bool __init spec_ctrl_bhi_dis(void)
+{
+       if (!boot_cpu_has(X86_FEATURE_BHI_CTRL))
+               return false;
+
+       x86_spec_ctrl_base |= SPEC_CTRL_BHI_DIS_S;
+       update_spec_ctrl(x86_spec_ctrl_base);
+       setup_force_cpu_cap(X86_FEATURE_CLEAR_BHB_HW);
+
+       return true;
+}
+
+enum bhi_mitigations {
+       BHI_MITIGATION_OFF,
+       BHI_MITIGATION_ON,
+       BHI_MITIGATION_AUTO,
+};
+
+static enum bhi_mitigations bhi_mitigation __ro_after_init =
+       IS_ENABLED(CONFIG_SPECTRE_BHI_ON)  ? BHI_MITIGATION_ON  :
+       IS_ENABLED(CONFIG_SPECTRE_BHI_OFF) ? BHI_MITIGATION_OFF :
+                                            BHI_MITIGATION_AUTO;
+
+static int __init spectre_bhi_parse_cmdline(char *str)
+{
+       if (!str)
+               return -EINVAL;
+
+       if (!strcmp(str, "off"))
+               bhi_mitigation = BHI_MITIGATION_OFF;
+       else if (!strcmp(str, "on"))
+               bhi_mitigation = BHI_MITIGATION_ON;
+       else if (!strcmp(str, "auto"))
+               bhi_mitigation = BHI_MITIGATION_AUTO;
+       else
+               pr_err("Ignoring unknown spectre_bhi option (%s)", str);
+
+       return 0;
+}
+early_param("spectre_bhi", spectre_bhi_parse_cmdline);
+
+static void __init bhi_select_mitigation(void)
+{
+       if (bhi_mitigation == BHI_MITIGATION_OFF)
+               return;
+
+       /* Retpoline mitigates against BHI unless the CPU has RRSBA behavior */
+       if (cpu_feature_enabled(X86_FEATURE_RETPOLINE) &&
+           !(x86_read_arch_cap_msr() & ARCH_CAP_RRSBA))
+               return;
+
+       if (spec_ctrl_bhi_dis())
+               return;
+
+       if (!IS_ENABLED(CONFIG_X86_64))
+               return;
+
+       /* Mitigate KVM by default */
+       setup_force_cpu_cap(X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT);
+       pr_info("Spectre BHI mitigation: SW BHB clearing on vm exit\n");
+
+       if (bhi_mitigation == BHI_MITIGATION_AUTO)
+               return;
+
+       /* Mitigate syscalls when the mitigation is forced =on */
+       setup_force_cpu_cap(X86_FEATURE_CLEAR_BHB_LOOP);
+       pr_info("Spectre BHI mitigation: SW BHB clearing on syscall\n");
+}
+
 static void __init spectre_v2_select_mitigation(void)
 {
        enum spectre_v2_mitigation_cmd cmd = spectre_v2_parse_cmdline();
@@ -1718,6 +1791,9 @@ static void __init spectre_v2_select_mitigation(void)
            mode == SPECTRE_V2_RETPOLINE)
                spec_ctrl_disable_kernel_rrsba();
 
+       if (boot_cpu_has(X86_BUG_BHI))
+               bhi_select_mitigation();
+
        spectre_v2_enabled = mode;
        pr_info("%s\n", spectre_v2_strings[mode]);
 
@@ -2695,15 +2771,15 @@ static char *stibp_state(void)
 
        switch (spectre_v2_user_stibp) {
        case SPECTRE_V2_USER_NONE:
-               return ", STIBP: disabled";
+               return "; STIBP: disabled";
        case SPECTRE_V2_USER_STRICT:
-               return ", STIBP: forced";
+               return "; STIBP: forced";
        case SPECTRE_V2_USER_STRICT_PREFERRED:
-               return ", STIBP: always-on";
+               return "; STIBP: always-on";
        case SPECTRE_V2_USER_PRCTL:
        case SPECTRE_V2_USER_SECCOMP:
                if (static_key_enabled(&switch_to_cond_stibp))
-                       return ", STIBP: conditional";
+                       return "; STIBP: conditional";
        }
        return "";
 }
@@ -2712,10 +2788,10 @@ static char *ibpb_state(void)
 {
        if (boot_cpu_has(X86_FEATURE_IBPB)) {
                if (static_key_enabled(&switch_mm_always_ibpb))
-                       return ", IBPB: always-on";
+                       return "; IBPB: always-on";
                if (static_key_enabled(&switch_mm_cond_ibpb))
-                       return ", IBPB: conditional";
-               return ", IBPB: disabled";
+                       return "; IBPB: conditional";
+               return "; IBPB: disabled";
        }
        return "";
 }
@@ -2725,14 +2801,31 @@ static char *pbrsb_eibrs_state(void)
        if (boot_cpu_has_bug(X86_BUG_EIBRS_PBRSB)) {
                if (boot_cpu_has(X86_FEATURE_RSB_VMEXIT_LITE) ||
                    boot_cpu_has(X86_FEATURE_RSB_VMEXIT))
-                       return ", PBRSB-eIBRS: SW sequence";
+                       return "; PBRSB-eIBRS: SW sequence";
                else
-                       return ", PBRSB-eIBRS: Vulnerable";
+                       return "; PBRSB-eIBRS: Vulnerable";
        } else {
-               return ", PBRSB-eIBRS: Not affected";
+               return "; PBRSB-eIBRS: Not affected";
        }
 }
 
+static const char * const spectre_bhi_state(void)
+{
+       if (!boot_cpu_has_bug(X86_BUG_BHI))
+               return "; BHI: Not affected";
+       else if  (boot_cpu_has(X86_FEATURE_CLEAR_BHB_HW))
+               return "; BHI: BHI_DIS_S";
+       else if  (boot_cpu_has(X86_FEATURE_CLEAR_BHB_LOOP))
+               return "; BHI: SW loop, KVM: SW loop";
+       else if (boot_cpu_has(X86_FEATURE_RETPOLINE) &&
+                !(x86_read_arch_cap_msr() & ARCH_CAP_RRSBA))
+               return "; BHI: Retpoline";
+       else if  (boot_cpu_has(X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT))
+               return "; BHI: Syscall hardening, KVM: SW loop";
+
+       return "; BHI: Vulnerable (Syscall hardening enabled)";
+}
+
 static ssize_t spectre_v2_show_state(char *buf)
 {
        if (spectre_v2_enabled == SPECTRE_V2_LFENCE)
@@ -2745,13 +2838,15 @@ static ssize_t spectre_v2_show_state(char *buf)
            spectre_v2_enabled == SPECTRE_V2_EIBRS_LFENCE)
                return sysfs_emit(buf, "Vulnerable: eIBRS+LFENCE with unprivileged eBPF and SMT\n");
 
-       return sysfs_emit(buf, "%s%s%s%s%s%s%s\n",
+       return sysfs_emit(buf, "%s%s%s%s%s%s%s%s\n",
                          spectre_v2_strings[spectre_v2_enabled],
                          ibpb_state(),
-                         boot_cpu_has(X86_FEATURE_USE_IBRS_FW) ? ", IBRS_FW" : "",
+                         boot_cpu_has(X86_FEATURE_USE_IBRS_FW) ? "; IBRS_FW" : "",
                          stibp_state(),
-                         boot_cpu_has(X86_FEATURE_RSB_CTXSW) ? ", RSB filling" : "",
+                         boot_cpu_has(X86_FEATURE_RSB_CTXSW) ? "; RSB filling" : "",
                          pbrsb_eibrs_state(),
+                         spectre_bhi_state(),
+                         /* this should always be at the end */
                          spectre_v2_module_string());
 }
 
index 5c1e6d6be267af3e7b489e9f71937e7be6b25448..754d91857d634a2c6055ed50afa659ac45749086 100644 (file)
@@ -1120,6 +1120,7 @@ static void identify_cpu_without_cpuid(struct cpuinfo_x86 *c)
 #define NO_SPECTRE_V2          BIT(8)
 #define NO_MMIO                        BIT(9)
 #define NO_EIBRS_PBRSB         BIT(10)
+#define NO_BHI                 BIT(11)
 
 #define VULNWL(vendor, family, model, whitelist)       \
        X86_MATCH_VENDOR_FAM_MODEL(vendor, family, model, whitelist)
@@ -1182,18 +1183,18 @@ static const __initconst struct x86_cpu_id cpu_vuln_whitelist[] = {
        VULNWL_INTEL(ATOM_TREMONT_D,            NO_ITLB_MULTIHIT | NO_EIBRS_PBRSB),
 
        /* AMD Family 0xf - 0x12 */
-       VULNWL_AMD(0x0f,        NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO),
-       VULNWL_AMD(0x10,        NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO),
-       VULNWL_AMD(0x11,        NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO),
-       VULNWL_AMD(0x12,        NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO),
+       VULNWL_AMD(0x0f,        NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO | NO_BHI),
+       VULNWL_AMD(0x10,        NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO | NO_BHI),
+       VULNWL_AMD(0x11,        NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO | NO_BHI),
+       VULNWL_AMD(0x12,        NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO | NO_BHI),
 
        /* FAMILY_ANY must be last, otherwise 0x0f - 0x12 matches won't work */
-       VULNWL_AMD(X86_FAMILY_ANY,      NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO | NO_EIBRS_PBRSB),
-       VULNWL_HYGON(X86_FAMILY_ANY,    NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO | NO_EIBRS_PBRSB),
+       VULNWL_AMD(X86_FAMILY_ANY,      NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO | NO_EIBRS_PBRSB | NO_BHI),
+       VULNWL_HYGON(X86_FAMILY_ANY,    NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO | NO_EIBRS_PBRSB | NO_BHI),
 
        /* Zhaoxin Family 7 */
-       VULNWL(CENTAUR, 7, X86_MODEL_ANY,       NO_SPECTRE_V2 | NO_SWAPGS | NO_MMIO),
-       VULNWL(ZHAOXIN, 7, X86_MODEL_ANY,       NO_SPECTRE_V2 | NO_SWAPGS | NO_MMIO),
+       VULNWL(CENTAUR, 7, X86_MODEL_ANY,       NO_SPECTRE_V2 | NO_SWAPGS | NO_MMIO | NO_BHI),
+       VULNWL(ZHAOXIN, 7, X86_MODEL_ANY,       NO_SPECTRE_V2 | NO_SWAPGS | NO_MMIO | NO_BHI),
        {}
 };
 
@@ -1435,6 +1436,13 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
        if (vulnerable_to_rfds(ia32_cap))
                setup_force_cpu_bug(X86_BUG_RFDS);
 
+       /* When virtualized, eIBRS could be hidden, assume vulnerable */
+       if (!(ia32_cap & ARCH_CAP_BHI_NO) &&
+           !cpu_matches(cpu_vuln_whitelist, NO_BHI) &&
+           (boot_cpu_has(X86_FEATURE_IBRS_ENHANCED) ||
+            boot_cpu_has(X86_FEATURE_HYPERVISOR)))
+               setup_force_cpu_bug(X86_BUG_BHI);
+
        if (cpu_matches(cpu_vuln_whitelist, NO_MELTDOWN))
                return;
 
index b5cc557cfc3736708d96be6372f34c9ad0be85e4..84d41be6d06ba4e79f49ae069f5f1d5ae20b00de 100644 (file)
@@ -2500,12 +2500,14 @@ static ssize_t set_bank(struct device *s, struct device_attribute *attr,
                return -EINVAL;
 
        b = &per_cpu(mce_banks_array, s->id)[bank];
-
        if (!b->init)
                return -ENODEV;
 
        b->ctl = new;
+
+       mutex_lock(&mce_sysfs_mutex);
        mce_restart();
+       mutex_unlock(&mce_sysfs_mutex);
 
        return size;
 }
index 422a4ddc2ab7c9408f1d2d21433fea7f320c6f85..7b29ebda024f4e69bc9a9326f9cecd8a86ee2abb 100644 (file)
@@ -108,7 +108,7 @@ static inline void k8_check_syscfg_dram_mod_en(void)
              (boot_cpu_data.x86 >= 0x0f)))
                return;
 
-       if (cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+       if (cc_platform_has(CC_ATTR_HOST_SEV_SNP))
                return;
 
        rdmsr(MSR_AMD64_SYSCFG, lo, hi);
index c99f26ebe7a6537a7cd43274701ac4f489648081..1a8687f8073a89f3335038c8b3ba7e2ee45aa6d4 100644 (file)
@@ -78,7 +78,8 @@ cpumask_any_housekeeping(const struct cpumask *mask, int exclude_cpu)
        else
                cpu = cpumask_any_but(mask, exclude_cpu);
 
-       if (!IS_ENABLED(CONFIG_NO_HZ_FULL))
+       /* Only continue if tick_nohz_full_mask has been initialized. */
+       if (!tick_nohz_full_enabled())
                return cpu;
 
        /* If the CPU picked isn't marked nohz_full nothing more needs doing. */
index a515328d9d7d88b802f588bf678d098e0ba53b86..af5aa2c754c22226080870967d6c410067c86447 100644 (file)
@@ -28,6 +28,7 @@ static const struct cpuid_bit cpuid_bits[] = {
        { X86_FEATURE_EPB,              CPUID_ECX,  3, 0x00000006, 0 },
        { X86_FEATURE_INTEL_PPIN,       CPUID_EBX,  0, 0x00000007, 1 },
        { X86_FEATURE_RRSBA_CTRL,       CPUID_EDX,  2, 0x00000007, 2 },
+       { X86_FEATURE_BHI_CTRL,         CPUID_EDX,  4, 0x00000007, 2 },
        { X86_FEATURE_CQM_LLC,          CPUID_EDX,  1, 0x0000000f, 0 },
        { X86_FEATURE_CQM_OCCUP_LLC,    CPUID_EDX,  0, 0x0000000f, 1 },
        { X86_FEATURE_CQM_MBM_TOTAL,    CPUID_EDX,  1, 0x0000000f, 1 },
index 0109e6c510e02fcced0e2bbbfedc6af8e48ba90e..e125e059e2c45d3e6657716777426b926d950aee 100644 (file)
@@ -35,6 +35,7 @@
 #include <asm/bios_ebda.h>
 #include <asm/bugs.h>
 #include <asm/cacheinfo.h>
+#include <asm/coco.h>
 #include <asm/cpu.h>
 #include <asm/efi.h>
 #include <asm/gart.h>
@@ -991,6 +992,7 @@ void __init setup_arch(char **cmdline_p)
         * memory size.
         */
        mem_encrypt_setup_arch();
+       cc_random_init();
 
        efi_fake_memmap();
        efi_find_mirror();
index 7e1e63cc48e67dcc3f3abc84362ba805fef5a93d..38ad066179d81fe7efc2b2469fbe583c12586e06 100644 (file)
@@ -2284,16 +2284,6 @@ static int __init snp_init_platform_device(void)
 }
 device_initcall(snp_init_platform_device);
 
-void kdump_sev_callback(void)
-{
-       /*
-        * Do wbinvd() on remote CPUs when SNP is enabled in order to
-        * safely do SNP_SHUTDOWN on the local CPU.
-        */
-       if (cpu_feature_enabled(X86_FEATURE_SEV_SNP))
-               wbinvd();
-}
-
 void sev_show_status(void)
 {
        int i;
index 3aaf7e86a859a2f680625b4d13b1c27d43fa5629..0ebdd088f28b852261786cb5f90d285b2af42ebc 100644 (file)
@@ -122,6 +122,7 @@ config KVM_AMD_SEV
        default y
        depends on KVM_AMD && X86_64
        depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
+       select ARCH_HAS_CC_PLATFORM
        help
          Provides support for launching Encrypted VMs (SEV) and Encrypted VMs
          with Encrypted State (SEV-ES) on AMD processors.
index aadefcaa9561d0a31e589784da7e871e4a0de2e0..2f4e155080badc5efdbcc93fbc909c5bbcf70094 100644 (file)
@@ -52,7 +52,7 @@ enum kvm_only_cpuid_leafs {
 #define X86_FEATURE_IPRED_CTRL         KVM_X86_FEATURE(CPUID_7_2_EDX, 1)
 #define KVM_X86_FEATURE_RRSBA_CTRL     KVM_X86_FEATURE(CPUID_7_2_EDX, 2)
 #define X86_FEATURE_DDPD_U             KVM_X86_FEATURE(CPUID_7_2_EDX, 3)
-#define X86_FEATURE_BHI_CTRL           KVM_X86_FEATURE(CPUID_7_2_EDX, 4)
+#define KVM_X86_FEATURE_BHI_CTRL       KVM_X86_FEATURE(CPUID_7_2_EDX, 4)
 #define X86_FEATURE_MCDT_NO            KVM_X86_FEATURE(CPUID_7_2_EDX, 5)
 
 /* CPUID level 0x80000007 (EDX). */
@@ -102,10 +102,12 @@ static const struct cpuid_reg reverse_cpuid[] = {
  */
 static __always_inline void reverse_cpuid_check(unsigned int x86_leaf)
 {
+       BUILD_BUG_ON(NR_CPUID_WORDS != NCAPINTS);
        BUILD_BUG_ON(x86_leaf == CPUID_LNX_1);
        BUILD_BUG_ON(x86_leaf == CPUID_LNX_2);
        BUILD_BUG_ON(x86_leaf == CPUID_LNX_3);
        BUILD_BUG_ON(x86_leaf == CPUID_LNX_4);
+       BUILD_BUG_ON(x86_leaf == CPUID_LNX_5);
        BUILD_BUG_ON(x86_leaf >= ARRAY_SIZE(reverse_cpuid));
        BUILD_BUG_ON(reverse_cpuid[x86_leaf].function == 0);
 }
@@ -126,6 +128,7 @@ static __always_inline u32 __feature_translate(int x86_feature)
        KVM_X86_TRANSLATE_FEATURE(CONSTANT_TSC);
        KVM_X86_TRANSLATE_FEATURE(PERFMON_V2);
        KVM_X86_TRANSLATE_FEATURE(RRSBA_CTRL);
+       KVM_X86_TRANSLATE_FEATURE(BHI_CTRL);
        default:
                return x86_feature;
        }
index e5a4d9b0e79fd23e2dfc8224aa7a89d3b243c103..61a7531d41b019a7f263b9c4e02ccfdf960dd09f 100644 (file)
@@ -3184,7 +3184,7 @@ struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
        unsigned long pfn;
        struct page *p;
 
-       if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+       if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP))
                return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
 
        /*
index 2bfbf758d06110f49c71a22c1f54da9d9499669a..f6986dee6f8c7c52622857f131adf766d1528121 100644 (file)
@@ -275,6 +275,8 @@ SYM_INNER_LABEL_ALIGN(vmx_vmexit, SYM_L_GLOBAL)
 
        call vmx_spec_ctrl_restore_host
 
+       CLEAR_BRANCH_HISTORY_VMEXIT
+
        /* Put return value in AX */
        mov %_ASM_BX, %_ASM_AX
 
index 47d9f03b7778373393b9853fe32b153dadd9de29..984ea2089efc3132154527508976879a4e11cb10 100644 (file)
@@ -1621,7 +1621,7 @@ static bool kvm_is_immutable_feature_msr(u32 msr)
         ARCH_CAP_PSCHANGE_MC_NO | ARCH_CAP_TSX_CTRL_MSR | ARCH_CAP_TAA_NO | \
         ARCH_CAP_SBDR_SSDP_NO | ARCH_CAP_FBSDP_NO | ARCH_CAP_PSDP_NO | \
         ARCH_CAP_FB_CLEAR | ARCH_CAP_RRSBA | ARCH_CAP_PBRSB_NO | ARCH_CAP_GDS_NO | \
-        ARCH_CAP_RFDS_NO | ARCH_CAP_RFDS_CLEAR)
+        ARCH_CAP_RFDS_NO | ARCH_CAP_RFDS_CLEAR | ARCH_CAP_BHI_NO)
 
 static u64 kvm_get_arch_capabilities(void)
 {
index 0795b3464058b0e515cb71d7bf84fa9908e6fc07..e674ccf720b9f6befe6ffb0fccec192fb1aa9a89 100644 (file)
@@ -229,6 +229,7 @@ SYM_CODE_END(srso_return_thunk)
 /* Dummy for the alternative in CALL_UNTRAIN_RET. */
 SYM_CODE_START(srso_alias_untrain_ret)
        ANNOTATE_UNRET_SAFE
+       ANNOTATE_NOENDBR
        ret
        int3
 SYM_FUNC_END(srso_alias_untrain_ret)
index 104544359d69cd20ef1e37449d32a687ee1f3433..025fd7ea5d69f5bfba07a712a14ef85671d68af5 100644 (file)
@@ -24,6 +24,7 @@
 
 #include <linux/memblock.h>
 #include <linux/init.h>
+#include <asm/pgtable_areas.h>
 
 #include "numa_internal.h"
 
index 0d72183b5dd028ad83b98b38183af643ad3b21b5..36b603d0cddefc7b208873e8d435cdf1ddbe20d9 100644 (file)
@@ -947,6 +947,38 @@ static void free_pfn_range(u64 paddr, unsigned long size)
                memtype_free(paddr, paddr + size);
 }
 
+static int get_pat_info(struct vm_area_struct *vma, resource_size_t *paddr,
+               pgprot_t *pgprot)
+{
+       unsigned long prot;
+
+       VM_WARN_ON_ONCE(!(vma->vm_flags & VM_PAT));
+
+       /*
+        * We need the starting PFN and cachemode used for track_pfn_remap()
+        * that covered the whole VMA. For most mappings, we can obtain that
+        * information from the page tables. For COW mappings, we might now
+        * suddenly have anon folios mapped and follow_phys() will fail.
+        *
+        * Fallback to using vma->vm_pgoff, see remap_pfn_range_notrack(), to
+        * detect the PFN. If we need the cachemode as well, we're out of luck
+        * for now and have to fail fork().
+        */
+       if (!follow_phys(vma, vma->vm_start, 0, &prot, paddr)) {
+               if (pgprot)
+                       *pgprot = __pgprot(prot);
+               return 0;
+       }
+       if (is_cow_mapping(vma->vm_flags)) {
+               if (pgprot)
+                       return -EINVAL;
+               *paddr = (resource_size_t)vma->vm_pgoff << PAGE_SHIFT;
+               return 0;
+       }
+       WARN_ON_ONCE(1);
+       return -EINVAL;
+}
+
 /*
  * track_pfn_copy is called when vma that is covering the pfnmap gets
  * copied through copy_page_range().
@@ -957,20 +989,13 @@ static void free_pfn_range(u64 paddr, unsigned long size)
 int track_pfn_copy(struct vm_area_struct *vma)
 {
        resource_size_t paddr;
-       unsigned long prot;
        unsigned long vma_size = vma->vm_end - vma->vm_start;
        pgprot_t pgprot;
 
        if (vma->vm_flags & VM_PAT) {
-               /*
-                * reserve the whole chunk covered by vma. We need the
-                * starting address and protection from pte.
-                */
-               if (follow_phys(vma, vma->vm_start, 0, &prot, &paddr)) {
-                       WARN_ON_ONCE(1);
+               if (get_pat_info(vma, &paddr, &pgprot))
                        return -EINVAL;
-               }
-               pgprot = __pgprot(prot);
+               /* reserve the whole chunk covered by vma. */
                return reserve_pfn_range(paddr, vma_size, &pgprot, 1);
        }
 
@@ -1045,7 +1070,6 @@ void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
                 unsigned long size, bool mm_wr_locked)
 {
        resource_size_t paddr;
-       unsigned long prot;
 
        if (vma && !(vma->vm_flags & VM_PAT))
                return;
@@ -1053,11 +1077,8 @@ void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
        /* free the chunk starting from pfn or the whole chunk */
        paddr = (resource_size_t)pfn << PAGE_SHIFT;
        if (!paddr && !size) {
-               if (follow_phys(vma, vma->vm_start, 0, &prot, &paddr)) {
-                       WARN_ON_ONCE(1);
+               if (get_pat_info(vma, &paddr, NULL))
                        return;
-               }
-
                size = vma->vm_end - vma->vm_start;
        }
        free_pfn_range(paddr, size);
index cffe1157a90acfcf741b31ac216d6fd3a9ed4fd2..ab0e8448bb6eb2bfbc4fab29321cd0ddbe876f7e 100644 (file)
@@ -77,7 +77,7 @@ static int __mfd_enable(unsigned int cpu)
 {
        u64 val;
 
-       if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+       if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP))
                return 0;
 
        rdmsrl(MSR_AMD64_SYSCFG, val);
@@ -98,7 +98,7 @@ static int __snp_enable(unsigned int cpu)
 {
        u64 val;
 
-       if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+       if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP))
                return 0;
 
        rdmsrl(MSR_AMD64_SYSCFG, val);
@@ -174,11 +174,11 @@ static int __init snp_rmptable_init(void)
        u64 rmptable_size;
        u64 val;
 
-       if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+       if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP))
                return 0;
 
        if (!amd_iommu_snp_en)
-               return 0;
+               goto nosnp;
 
        if (!probed_rmp_size)
                goto nosnp;
@@ -225,7 +225,7 @@ skip_enable:
        return 0;
 
 nosnp:
-       setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
+       cc_platform_clear(CC_ATTR_HOST_SEV_SNP);
        return -ENOSYS;
 }
 
@@ -246,7 +246,7 @@ static struct rmpentry *__snp_lookup_rmpentry(u64 pfn, int *level)
 {
        struct rmpentry *large_entry, *entry;
 
-       if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+       if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP))
                return ERR_PTR(-ENODEV);
 
        entry = get_rmpentry(pfn);
@@ -363,7 +363,7 @@ int psmash(u64 pfn)
        unsigned long paddr = pfn << PAGE_SHIFT;
        int ret;
 
-       if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+       if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP))
                return -ENODEV;
 
        if (!pfn_valid(pfn))
@@ -472,7 +472,7 @@ static int rmpupdate(u64 pfn, struct rmp_state *state)
        unsigned long paddr = pfn << PAGE_SHIFT;
        int ret, level;
 
-       if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+       if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP))
                return -ENODEV;
 
        level = RMP_TO_PG_LEVEL(state->pagesize);
@@ -558,3 +558,13 @@ void snp_leak_pages(u64 pfn, unsigned int npages)
        spin_unlock(&snp_leaked_pages_list_lock);
 }
 EXPORT_SYMBOL_GPL(snp_leak_pages);
+
+void kdump_sev_callback(void)
+{
+       /*
+        * Do wbinvd() on remote CPUs when SNP is enabled in order to
+        * safely do SNP_SHUTDOWN on the local CPU.
+        */
+       if (cc_platform_has(CC_ATTR_HOST_SEV_SNP))
+               wbinvd();
+}
index 7a5f611c3d2e3e83eb00be12b9131f49d8348f5e..b8e32d933a6369aebbffe36e2ce3165cc2288bef 100644 (file)
@@ -583,9 +583,6 @@ static void bd_finish_claiming(struct block_device *bdev, void *holder,
        mutex_unlock(&bdev->bd_holder_lock);
        bd_clear_claiming(whole, holder);
        mutex_unlock(&bdev_lock);
-
-       if (hops && hops->get_holder)
-               hops->get_holder(holder);
 }
 
 /**
@@ -608,7 +605,6 @@ EXPORT_SYMBOL(bd_abort_claiming);
 static void bd_end_claim(struct block_device *bdev, void *holder)
 {
        struct block_device *whole = bdev_whole(bdev);
-       const struct blk_holder_ops *hops = bdev->bd_holder_ops;
        bool unblock = false;
 
        /*
@@ -631,9 +627,6 @@ static void bd_end_claim(struct block_device *bdev, void *holder)
                whole->bd_holder = NULL;
        mutex_unlock(&bdev_lock);
 
-       if (hops && hops->put_holder)
-               hops->put_holder(holder);
-
        /*
         * If this was the last claim, remove holder link and unblock evpoll if
         * it was a write holder.
@@ -776,17 +769,17 @@ void blkdev_put_no_open(struct block_device *bdev)
 
 static bool bdev_writes_blocked(struct block_device *bdev)
 {
-       return bdev->bd_writers == -1;
+       return bdev->bd_writers < 0;
 }
 
 static void bdev_block_writes(struct block_device *bdev)
 {
-       bdev->bd_writers = -1;
+       bdev->bd_writers--;
 }
 
 static void bdev_unblock_writes(struct block_device *bdev)
 {
-       bdev->bd_writers = 0;
+       bdev->bd_writers++;
 }
 
 static bool bdev_may_open(struct block_device *bdev, blk_mode_t mode)
@@ -813,6 +806,11 @@ static void bdev_claim_write_access(struct block_device *bdev, blk_mode_t mode)
                bdev->bd_writers++;
 }
 
+static inline bool bdev_unclaimed(const struct file *bdev_file)
+{
+       return bdev_file->private_data == BDEV_I(bdev_file->f_mapping->host);
+}
+
 static void bdev_yield_write_access(struct file *bdev_file)
 {
        struct block_device *bdev;
@@ -820,14 +818,15 @@ static void bdev_yield_write_access(struct file *bdev_file)
        if (bdev_allow_write_mounted)
                return;
 
+       if (bdev_unclaimed(bdev_file))
+               return;
+
        bdev = file_bdev(bdev_file);
-       /* Yield exclusive or shared write access. */
-       if (bdev_file->f_mode & FMODE_WRITE) {
-               if (bdev_writes_blocked(bdev))
-                       bdev_unblock_writes(bdev);
-               else
-                       bdev->bd_writers--;
-       }
+
+       if (bdev_file->f_mode & FMODE_WRITE_RESTRICTED)
+               bdev_unblock_writes(bdev);
+       else if (bdev_file->f_mode & FMODE_WRITE)
+               bdev->bd_writers--;
 }
 
 /**
@@ -907,6 +906,8 @@ int bdev_open(struct block_device *bdev, blk_mode_t mode, void *holder,
        bdev_file->f_mode |= FMODE_BUF_RASYNC | FMODE_CAN_ODIRECT;
        if (bdev_nowait(bdev))
                bdev_file->f_mode |= FMODE_NOWAIT;
+       if (mode & BLK_OPEN_RESTRICT_WRITES)
+               bdev_file->f_mode |= FMODE_WRITE_RESTRICTED;
        bdev_file->f_mapping = bdev->bd_inode->i_mapping;
        bdev_file->f_wb_err = filemap_sample_wb_err(bdev_file->f_mapping);
        bdev_file->private_data = holder;
@@ -1012,6 +1013,20 @@ struct file *bdev_file_open_by_path(const char *path, blk_mode_t mode,
 }
 EXPORT_SYMBOL(bdev_file_open_by_path);
 
+static inline void bd_yield_claim(struct file *bdev_file)
+{
+       struct block_device *bdev = file_bdev(bdev_file);
+       void *holder = bdev_file->private_data;
+
+       lockdep_assert_held(&bdev->bd_disk->open_mutex);
+
+       if (WARN_ON_ONCE(IS_ERR_OR_NULL(holder)))
+               return;
+
+       if (!bdev_unclaimed(bdev_file))
+               bd_end_claim(bdev, holder);
+}
+
 void bdev_release(struct file *bdev_file)
 {
        struct block_device *bdev = file_bdev(bdev_file);
@@ -1036,7 +1051,7 @@ void bdev_release(struct file *bdev_file)
        bdev_yield_write_access(bdev_file);
 
        if (holder)
-               bd_end_claim(bdev, holder);
+               bd_yield_claim(bdev_file);
 
        /*
         * Trigger event checking and tell drivers to flush MEDIA_CHANGE
@@ -1056,6 +1071,39 @@ put_no_open:
        blkdev_put_no_open(bdev);
 }
 
+/**
+ * bdev_fput - yield claim to the block device and put the file
+ * @bdev_file: open block device
+ *
+ * Yield claim on the block device and put the file. Ensure that the
+ * block device can be reclaimed before the file is closed which is a
+ * deferred operation.
+ */
+void bdev_fput(struct file *bdev_file)
+{
+       if (WARN_ON_ONCE(bdev_file->f_op != &def_blk_fops))
+               return;
+
+       if (bdev_file->private_data) {
+               struct block_device *bdev = file_bdev(bdev_file);
+               struct gendisk *disk = bdev->bd_disk;
+
+               mutex_lock(&disk->open_mutex);
+               bdev_yield_write_access(bdev_file);
+               bd_yield_claim(bdev_file);
+               /*
+                * Tell release we already gave up our hold on the
+                * device and if write restrictions are available that
+                * we already gave up write access to the device.
+                */
+               bdev_file->private_data = BDEV_I(bdev_file->f_mapping->host);
+               mutex_unlock(&disk->open_mutex);
+       }
+
+       fput(bdev_file);
+}
+EXPORT_SYMBOL(bdev_fput);
+
 /**
  * lookup_bdev() - Look up a struct block_device by name.
  * @pathname: Name of the block device in the filesystem.
index 0c76137adcaaa5b9d212d789291d681c23c064f6..a9028a2c2db57b0881b7faf7cee2243148d1626c 100644 (file)
@@ -96,7 +96,7 @@ static int blk_ioctl_discard(struct block_device *bdev, blk_mode_t mode,
                unsigned long arg)
 {
        uint64_t range[2];
-       uint64_t start, len;
+       uint64_t start, len, end;
        struct inode *inode = bdev->bd_inode;
        int err;
 
@@ -117,7 +117,8 @@ static int blk_ioctl_discard(struct block_device *bdev, blk_mode_t mode,
        if (len & 511)
                return -EINVAL;
 
-       if (start + len > bdev_nr_bytes(bdev))
+       if (check_add_overflow(start, len, &end) ||
+           end > bdev_nr_bytes(bdev))
                return -EINVAL;
 
        filemap_invalidate_lock(inode->i_mapping);
index 302dce0b2b5044e20489f4b34bb8f4fde189e597..d67881b50bca28a1e08bb494b00c2bf0ee44957b 100644 (file)
@@ -662,14 +662,15 @@ static int acpi_thermal_register_thermal_zone(struct acpi_thermal *tz,
 {
        int result;
 
-       tz->thermal_zone = thermal_zone_device_register_with_trips("acpitz",
-                                                                  trip_table,
-                                                                  trip_count,
-                                                                  tz,
-                                                                  &acpi_thermal_zone_ops,
-                                                                  NULL,
-                                                                  passive_delay,
-                                                                  tz->polling_frequency * 100);
+       if (trip_count)
+               tz->thermal_zone = thermal_zone_device_register_with_trips(
+                                       "acpitz", trip_table, trip_count, tz,
+                                       &acpi_thermal_zone_ops, NULL, passive_delay,
+                                       tz->polling_frequency * 100);
+       else
+               tz->thermal_zone = thermal_tripless_zone_device_register(
+                                       "acpitz", tz, &acpi_thermal_zone_ops, NULL);
+
        if (IS_ERR(tz->thermal_zone))
                return PTR_ERR(tz->thermal_zone);
 
@@ -901,11 +902,8 @@ static int acpi_thermal_add(struct acpi_device *device)
                trip++;
        }
 
-       if (trip == trip_table) {
+       if (trip == trip_table)
                pr_warn(FW_BUG "No valid trip points!\n");
-               result = -ENODEV;
-               goto free_memory;
-       }
 
        result = acpi_thermal_register_thermal_zone(tz, trip_table,
                                                    trip - trip_table,
index d4a626f87963ba123a4f07a366c28681db4714fa..79a8b0aa37bf37fa8eb44e2dcba7181dfb2222b0 100644 (file)
@@ -30,7 +30,6 @@
 #define ST_AHCI_OOBR_CIMAX_SHIFT       0
 
 struct st_ahci_drv_data {
-       struct platform_device *ahci;
        struct reset_control *pwr;
        struct reset_control *sw_rst;
        struct reset_control *pwr_rst;
index 4ac854f6b05777c669d7de39ab006d963b74bd48..88b2e9817f49dfd200a0f58835a9344cab1e2818 100644 (file)
@@ -1371,9 +1371,6 @@ static struct pci_driver pata_macio_pci_driver = {
        .suspend        = pata_macio_pci_suspend,
        .resume         = pata_macio_pci_resume,
 #endif
-       .driver = {
-               .owner          = THIS_MODULE,
-       },
 };
 MODULE_DEVICE_TABLE(pci, pata_macio_pci_match);
 
index 400b22ee99c33affba7b25ae46de0a2014bfd71f..4c270999ba3ccd9dd70175b02886998cc47e99a9 100644 (file)
@@ -200,7 +200,10 @@ int gemini_sata_start_bridge(struct sata_gemini *sg, unsigned int bridge)
                pclk = sg->sata0_pclk;
        else
                pclk = sg->sata1_pclk;
-       clk_enable(pclk);
+       ret = clk_enable(pclk);
+       if (ret)
+               return ret;
+
        msleep(10);
 
        /* Do not keep clocking a bridge that is not online */
index e82786c63fbd73decc4af68d1a3aff1113411a27..9bec0aee92e04c412fec0abe4ac30173950890fb 100644 (file)
@@ -787,37 +787,6 @@ static const struct ata_port_info mv_port_info[] = {
        },
 };
 
-static const struct pci_device_id mv_pci_tbl[] = {
-       { PCI_VDEVICE(MARVELL, 0x5040), chip_504x },
-       { PCI_VDEVICE(MARVELL, 0x5041), chip_504x },
-       { PCI_VDEVICE(MARVELL, 0x5080), chip_5080 },
-       { PCI_VDEVICE(MARVELL, 0x5081), chip_508x },
-       /* RocketRAID 1720/174x have different identifiers */
-       { PCI_VDEVICE(TTI, 0x1720), chip_6042 },
-       { PCI_VDEVICE(TTI, 0x1740), chip_6042 },
-       { PCI_VDEVICE(TTI, 0x1742), chip_6042 },
-
-       { PCI_VDEVICE(MARVELL, 0x6040), chip_604x },
-       { PCI_VDEVICE(MARVELL, 0x6041), chip_604x },
-       { PCI_VDEVICE(MARVELL, 0x6042), chip_6042 },
-       { PCI_VDEVICE(MARVELL, 0x6080), chip_608x },
-       { PCI_VDEVICE(MARVELL, 0x6081), chip_608x },
-
-       { PCI_VDEVICE(ADAPTEC2, 0x0241), chip_604x },
-
-       /* Adaptec 1430SA */
-       { PCI_VDEVICE(ADAPTEC2, 0x0243), chip_7042 },
-
-       /* Marvell 7042 support */
-       { PCI_VDEVICE(MARVELL, 0x7042), chip_7042 },
-
-       /* Highpoint RocketRAID PCIe series */
-       { PCI_VDEVICE(TTI, 0x2300), chip_7042 },
-       { PCI_VDEVICE(TTI, 0x2310), chip_7042 },
-
-       { }                     /* terminate list */
-};
-
 static const struct mv_hw_ops mv5xxx_ops = {
        .phy_errata             = mv5_phy_errata,
        .enable_leds            = mv5_enable_leds,
@@ -4303,6 +4272,36 @@ static int mv_pci_init_one(struct pci_dev *pdev,
 static int mv_pci_device_resume(struct pci_dev *pdev);
 #endif
 
+static const struct pci_device_id mv_pci_tbl[] = {
+       { PCI_VDEVICE(MARVELL, 0x5040), chip_504x },
+       { PCI_VDEVICE(MARVELL, 0x5041), chip_504x },
+       { PCI_VDEVICE(MARVELL, 0x5080), chip_5080 },
+       { PCI_VDEVICE(MARVELL, 0x5081), chip_508x },
+       /* RocketRAID 1720/174x have different identifiers */
+       { PCI_VDEVICE(TTI, 0x1720), chip_6042 },
+       { PCI_VDEVICE(TTI, 0x1740), chip_6042 },
+       { PCI_VDEVICE(TTI, 0x1742), chip_6042 },
+
+       { PCI_VDEVICE(MARVELL, 0x6040), chip_604x },
+       { PCI_VDEVICE(MARVELL, 0x6041), chip_604x },
+       { PCI_VDEVICE(MARVELL, 0x6042), chip_6042 },
+       { PCI_VDEVICE(MARVELL, 0x6080), chip_608x },
+       { PCI_VDEVICE(MARVELL, 0x6081), chip_608x },
+
+       { PCI_VDEVICE(ADAPTEC2, 0x0241), chip_604x },
+
+       /* Adaptec 1430SA */
+       { PCI_VDEVICE(ADAPTEC2, 0x0243), chip_7042 },
+
+       /* Marvell 7042 support */
+       { PCI_VDEVICE(MARVELL, 0x7042), chip_7042 },
+
+       /* Highpoint RocketRAID PCIe series */
+       { PCI_VDEVICE(TTI, 0x2300), chip_7042 },
+       { PCI_VDEVICE(TTI, 0x2310), chip_7042 },
+
+       { }                     /* terminate list */
+};
 
 static struct pci_driver mv_pci_driver = {
        .name                   = DRV_NAME,
@@ -4315,6 +4314,7 @@ static struct pci_driver mv_pci_driver = {
 #endif
 
 };
+MODULE_DEVICE_TABLE(pci, mv_pci_tbl);
 
 /**
  *      mv_print_info - Dump key info to kernel log for perusal.
@@ -4487,7 +4487,6 @@ static void __exit mv_exit(void)
 MODULE_AUTHOR("Brett Russ");
 MODULE_DESCRIPTION("SCSI low-level driver for Marvell SATA controllers");
 MODULE_LICENSE("GPL v2");
-MODULE_DEVICE_TABLE(pci, mv_pci_tbl);
 MODULE_VERSION(DRV_VERSION);
 MODULE_ALIAS("platform:" DRV_NAME);
 
index b51d7a9d0d90ce0a6c72fe841aad222708c046e1..a482741eb181ffca923519ba7d8ab5e73da1e176 100644 (file)
@@ -957,8 +957,7 @@ static void pdc20621_get_from_dimm(struct ata_host *host, void *psource,
 
        offset -= (idx * window_size);
        idx++;
-       dist = ((long) (window_size - (offset + size))) >= 0 ? size :
-               (long) (window_size - offset);
+       dist = min(size, window_size - offset);
        memcpy_fromio(psource, dimm_mmio + offset / 4, dist);
 
        psource += dist;
@@ -1005,8 +1004,7 @@ static void pdc20621_put_to_dimm(struct ata_host *host, void *psource,
        readl(mmio + PDC_DIMM_WINDOW_CTLR);
        offset -= (idx * window_size);
        idx++;
-       dist = ((long)(s32)(window_size - (offset + size))) >= 0 ? size :
-               (long) (window_size - offset);
+       dist = min(size, window_size - offset);
        memcpy_toio(dimm_mmio + offset / 4, psource, dist);
        writel(0x01, mmio + PDC_GENERAL_CTLR);
        readl(mmio + PDC_GENERAL_CTLR);
index b93f3c5716aeeaa001651962f0af2c73bdfd5685..5f4e03336e68ef459f0df6c20348f8e1996956ba 100644 (file)
@@ -44,6 +44,7 @@ static bool fw_devlink_is_permissive(void);
 static void __fw_devlink_link_to_consumers(struct device *dev);
 static bool fw_devlink_drv_reg_done;
 static bool fw_devlink_best_effort;
+static struct workqueue_struct *device_link_wq;
 
 /**
  * __fwnode_link_add - Create a link between two fwnode_handles.
@@ -533,12 +534,26 @@ static void devlink_dev_release(struct device *dev)
        /*
         * It may take a while to complete this work because of the SRCU
         * synchronization in device_link_release_fn() and if the consumer or
-        * supplier devices get deleted when it runs, so put it into the "long"
-        * workqueue.
+        * supplier devices get deleted when it runs, so put it into the
+        * dedicated workqueue.
         */
-       queue_work(system_long_wq, &link->rm_work);
+       queue_work(device_link_wq, &link->rm_work);
 }
 
+/**
+ * device_link_wait_removal - Wait for ongoing devlink removal jobs to terminate
+ */
+void device_link_wait_removal(void)
+{
+       /*
+        * devlink removal jobs are queued in the dedicated work queue.
+        * To be sure that all removal jobs are terminated, ensure that any
+        * scheduled work has run to completion.
+        */
+       flush_workqueue(device_link_wq);
+}
+EXPORT_SYMBOL_GPL(device_link_wait_removal);
+
 static struct class devlink_class = {
        .name = "devlink",
        .dev_groups = devlink_groups,
@@ -4164,9 +4179,14 @@ int __init devices_init(void)
        sysfs_dev_char_kobj = kobject_create_and_add("char", dev_kobj);
        if (!sysfs_dev_char_kobj)
                goto char_kobj_err;
+       device_link_wq = alloc_workqueue("device_link_wq", 0, 0);
+       if (!device_link_wq)
+               goto wq_err;
 
        return 0;
 
+ wq_err:
+       kobject_put(sysfs_dev_char_kobj);
  char_kobj_err:
        kobject_put(sysfs_dev_block_kobj);
  block_kobj_err:
index 41edd6a430eb457a76db36e8b7e9c58758bf2fe4..55999a50ccc0b85bb688be0b36ae9af5384b2965 100644 (file)
@@ -112,7 +112,7 @@ static int regcache_maple_drop(struct regmap *map, unsigned int min,
        unsigned long *entry, *lower, *upper;
        unsigned long lower_index, lower_last;
        unsigned long upper_index, upper_last;
-       int ret;
+       int ret = 0;
 
        lower = NULL;
        upper = NULL;
@@ -145,7 +145,7 @@ static int regcache_maple_drop(struct regmap *map, unsigned int min,
                        upper_index = max + 1;
                        upper_last = mas.last;
 
-                       upper = kmemdup(&entry[max + 1],
+                       upper = kmemdup(&entry[max - mas.index + 1],
                                        ((mas.last - max) *
                                         sizeof(unsigned long)),
                                        map->alloc_flags);
@@ -244,7 +244,7 @@ static int regcache_maple_sync(struct regmap *map, unsigned int min,
        unsigned long lmin = min;
        unsigned long lmax = max;
        unsigned int r, v, sync_start;
-       int ret;
+       int ret = 0;
        bool sync_needed = false;
 
        map->cache_bypass = true;
index 71c39bcd872c7ecaabc67e91f35aa2fb267d6826..ed33cf7192d21672fb389a93c20fbbb887796337 100644 (file)
@@ -1965,10 +1965,10 @@ static int null_add_dev(struct nullb_device *dev)
 
 out_ida_free:
        ida_free(&nullb_indexes, nullb->index);
-out_cleanup_zone:
-       null_free_zoned_dev(dev);
 out_cleanup_disk:
        put_disk(nullb->disk);
+out_cleanup_zone:
+       null_free_zoned_dev(dev);
 out_cleanup_tags:
        if (nullb->tag_set == &nullb->__tag_set)
                blk_mq_free_tag_set(nullb->tag_set);
index f44efbb89c346a8e0b72c3d262ec9213c95aab18..2102377f727b1eecac8d28423e10a3e313b38683 100644 (file)
@@ -1090,7 +1090,7 @@ static int __sev_snp_init_locked(int *error)
        void *arg = &data;
        int cmd, rc = 0;
 
-       if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+       if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP))
                return -ENODEV;
 
        sev = psp->sev_data;
index 7bc71f4be64a07510507e1c9b7d0f1a61de30e3b..38d19410a2be68cab9f382d48ab7f15493c42af0 100644 (file)
@@ -2060,6 +2060,8 @@ static void bus_reset_work(struct work_struct *work)
 
        ohci->generation = generation;
        reg_write(ohci, OHCI1394_IntEventClear, OHCI1394_busReset);
+       if (param_debug & OHCI_PARAM_DEBUG_BUSRESETS)
+               reg_write(ohci, OHCI1394_IntMaskSet, OHCI1394_busReset);
 
        if (ohci->quirks & QUIRK_RESET_PACKET)
                ohci->request_generation = generation;
@@ -2125,12 +2127,14 @@ static irqreturn_t irq_handler(int irq, void *data)
                return IRQ_NONE;
 
        /*
-        * busReset and postedWriteErr must not be cleared yet
+        * busReset and postedWriteErr events must not be cleared yet
         * (OHCI 1.1 clauses 7.2.3.2 and 13.2.8.1)
         */
        reg_write(ohci, OHCI1394_IntEventClear,
                  event & ~(OHCI1394_busReset | OHCI1394_postedWriteErr));
        log_irqs(ohci, event);
+       if (event & OHCI1394_busReset)
+               reg_write(ohci, OHCI1394_IntMaskClear, OHCI1394_busReset);
 
        if (event & OHCI1394_selfIDComplete)
                queue_work(selfid_workqueue, &ohci->bus_reset_work);
index fa96356102510967cb30538375c9348d122c3910..d09c7d72836551ab510031179a9b95340cd3fb36 100644 (file)
@@ -728,6 +728,25 @@ static u32 line_event_id(int level)
                       GPIO_V2_LINE_EVENT_FALLING_EDGE;
 }
 
+static inline char *make_irq_label(const char *orig)
+{
+       char *new;
+
+       if (!orig)
+               return NULL;
+
+       new = kstrdup_and_replace(orig, '/', ':', GFP_KERNEL);
+       if (!new)
+               return ERR_PTR(-ENOMEM);
+
+       return new;
+}
+
+static inline void free_irq_label(const char *label)
+{
+       kfree(label);
+}
+
 #ifdef CONFIG_HTE
 
 static enum hte_return process_hw_ts_thread(void *p)
@@ -1015,6 +1034,7 @@ static int debounce_setup(struct line *line, unsigned int debounce_period_us)
 {
        unsigned long irqflags;
        int ret, level, irq;
+       char *label;
 
        /* try hardware */
        ret = gpiod_set_debounce(line->desc, debounce_period_us);
@@ -1037,11 +1057,17 @@ static int debounce_setup(struct line *line, unsigned int debounce_period_us)
                        if (irq < 0)
                                return -ENXIO;
 
+                       label = make_irq_label(line->req->label);
+                       if (IS_ERR(label))
+                               return -ENOMEM;
+
                        irqflags = IRQF_TRIGGER_FALLING | IRQF_TRIGGER_RISING;
                        ret = request_irq(irq, debounce_irq_handler, irqflags,
-                                         line->req->label, line);
-                       if (ret)
+                                         label, line);
+                       if (ret) {
+                               free_irq_label(label);
                                return ret;
+                       }
                        line->irq = irq;
                } else {
                        ret = hte_edge_setup(line, GPIO_V2_LINE_FLAG_EDGE_BOTH);
@@ -1083,16 +1109,6 @@ static u32 gpio_v2_line_config_debounce_period(struct gpio_v2_line_config *lc,
        return 0;
 }
 
-static inline char *make_irq_label(const char *orig)
-{
-       return kstrdup_and_replace(orig, '/', ':', GFP_KERNEL);
-}
-
-static inline void free_irq_label(const char *label)
-{
-       kfree(label);
-}
-
 static void edge_detector_stop(struct line *line)
 {
        if (line->irq) {
@@ -1158,8 +1174,8 @@ static int edge_detector_setup(struct line *line,
        irqflags |= IRQF_ONESHOT;
 
        label = make_irq_label(line->req->label);
-       if (!label)
-               return -ENOMEM;
+       if (IS_ERR(label))
+               return PTR_ERR(label);
 
        /* Request a thread to read the events */
        ret = request_threaded_irq(irq, edge_irq_handler, edge_irq_thread,
@@ -2217,8 +2233,8 @@ static int lineevent_create(struct gpio_device *gdev, void __user *ip)
                goto out_free_le;
 
        label = make_irq_label(le->label);
-       if (!label) {
-               ret = -ENOMEM;
+       if (IS_ERR(label)) {
+               ret = PTR_ERR(label);
                goto out_free_le;
        }
 
index 59ccf9a3e1539c93e705139fb2c4a1f7db166001..94903fc1c1459f9fd26eba62628037492e202620 100644 (file)
@@ -1175,6 +1175,9 @@ struct gpio_device *gpio_device_find(const void *data,
 
        list_for_each_entry_srcu(gdev, &gpio_devices, list,
                                 srcu_read_lock_held(&gpio_devices_srcu)) {
+               if (!device_is_registered(&gdev->dev))
+                       continue;
+
                guard(srcu)(&gdev->srcu);
 
                gc = srcu_dereference(gdev->chip, &gdev->srcu);
index bd61e20770a5be20b8978be47a3ba2eaae0c3289..14a2a8473682b00a84e5a0e3907969e719fa5019 100644 (file)
@@ -52,7 +52,7 @@
  * @adapter: I2C adapter for the DDC bus
  * @offset: register offset
  * @buffer: buffer for return data
- * @size: sizo of the buffer
+ * @size: size of the buffer
  *
  * Reads @size bytes from the DP dual mode adaptor registers
  * starting at @offset.
@@ -116,7 +116,7 @@ EXPORT_SYMBOL(drm_dp_dual_mode_read);
  * @adapter: I2C adapter for the DDC bus
  * @offset: register offset
  * @buffer: buffer for write data
- * @size: sizo of the buffer
+ * @size: size of the buffer
  *
  * Writes @size bytes to the DP dual mode adaptor registers
  * starting at @offset.
index 7352bde299d54767fecb34232cb5941a01d6ea88..03bd3c7bd0dc2cf833decec93ce2186cf955a9bd 100644 (file)
@@ -582,7 +582,12 @@ int drm_gem_map_attach(struct dma_buf *dma_buf,
 {
        struct drm_gem_object *obj = dma_buf->priv;
 
-       if (!obj->funcs->get_sg_table)
+       /*
+        * drm_gem_map_dma_buf() requires obj->get_sg_table(), but drivers
+        * that implement their own ->map_dma_buf() do not.
+        */
+       if (dma_buf->ops->map_dma_buf == drm_gem_map_dma_buf &&
+           !obj->funcs->get_sg_table)
                return -ENOSYS;
 
        return drm_gem_pin(obj);
index 4c2f85632391a669c35012439094250e5e9c6dc5..fba73c38e23569fa521e387484b96eadfb988d80 100644 (file)
@@ -118,6 +118,7 @@ gt-y += \
        gt/intel_ggtt_fencing.o \
        gt/intel_gt.o \
        gt/intel_gt_buffer_pool.o \
+       gt/intel_gt_ccs_mode.o \
        gt/intel_gt_clock_utils.o \
        gt/intel_gt_debugfs.o \
        gt/intel_gt_engines_debugfs.o \
index ab2f52d21bad8bad22c184cce6aefac8b0ce5a29..8af9e6128277af050fb95b2902615c4ce9678592 100644 (file)
@@ -2709,15 +2709,6 @@ static void intel_set_pipe_src_size(const struct intel_crtc_state *crtc_state)
         */
        intel_de_write(dev_priv, PIPESRC(pipe),
                       PIPESRC_WIDTH(width - 1) | PIPESRC_HEIGHT(height - 1));
-
-       if (!crtc_state->enable_psr2_su_region_et)
-               return;
-
-       width = drm_rect_width(&crtc_state->psr2_su_area);
-       height = drm_rect_height(&crtc_state->psr2_su_area);
-
-       intel_de_write(dev_priv, PIPE_SRCSZ_ERLY_TPT(pipe),
-                      PIPESRC_WIDTH(width - 1) | PIPESRC_HEIGHT(height - 1));
 }
 
 static bool intel_pipe_is_interlaced(const struct intel_crtc_state *crtc_state)
index fe42688137863ca8dc95057aa3fffe5ada026211..9b1bce2624b9ea1a8e45bcc5b76e4621703d073e 100644 (file)
@@ -47,6 +47,7 @@ struct drm_printer;
 #define HAS_DPT(i915)                  (DISPLAY_VER(i915) >= 13)
 #define HAS_DSB(i915)                  (DISPLAY_INFO(i915)->has_dsb)
 #define HAS_DSC(__i915)                        (DISPLAY_RUNTIME_INFO(__i915)->has_dsc)
+#define HAS_DSC_MST(__i915)            (DISPLAY_VER(__i915) >= 12 && HAS_DSC(__i915))
 #define HAS_FBC(i915)                  (DISPLAY_RUNTIME_INFO(i915)->fbc_mask != 0)
 #define HAS_FPGA_DBG_UNCLAIMED(i915)   (DISPLAY_INFO(i915)->has_fpga_dbg)
 #define HAS_FW_BLC(i915)               (DISPLAY_VER(i915) >= 3)
index 9104f18753b484fde2b439f85494fc77a3d27c87..bf3f942e19c3d38a314d2e5c5065dbb73b36682f 100644 (file)
@@ -1423,6 +1423,8 @@ struct intel_crtc_state {
 
        u32 psr2_man_track_ctl;
 
+       u32 pipe_srcsz_early_tpt;
+
        struct drm_rect psr2_su_area;
 
        /* Variable Refresh Rate state */
index f98ef4b42a448f57d5dfaf0459cba23a00946870..abd62bebc46d0e58d5bc78d8f4500ddcbc6098f1 100644 (file)
@@ -499,7 +499,7 @@ intel_dp_set_source_rates(struct intel_dp *intel_dp)
        /* The values must be in increasing order */
        static const int mtl_rates[] = {
                162000, 216000, 243000, 270000, 324000, 432000, 540000, 675000,
-               810000, 1000000, 1350000, 2000000,
+               810000, 1000000, 2000000,
        };
        static const int icl_rates[] = {
                162000, 216000, 270000, 324000, 432000, 540000, 648000, 810000,
@@ -1422,7 +1422,8 @@ static bool intel_dp_source_supports_fec(struct intel_dp *intel_dp,
        if (DISPLAY_VER(dev_priv) >= 12)
                return true;
 
-       if (DISPLAY_VER(dev_priv) == 11 && encoder->port != PORT_A)
+       if (DISPLAY_VER(dev_priv) == 11 && encoder->port != PORT_A &&
+           !intel_crtc_has_type(pipe_config, INTEL_OUTPUT_DP_MST))
                return true;
 
        return false;
@@ -1917,8 +1918,9 @@ icl_dsc_compute_link_config(struct intel_dp *intel_dp,
        dsc_max_bpp = min(dsc_max_bpp, pipe_bpp - 1);
 
        for (i = 0; i < ARRAY_SIZE(valid_dsc_bpp); i++) {
-               if (valid_dsc_bpp[i] < dsc_min_bpp ||
-                   valid_dsc_bpp[i] > dsc_max_bpp)
+               if (valid_dsc_bpp[i] < dsc_min_bpp)
+                       continue;
+               if (valid_dsc_bpp[i] > dsc_max_bpp)
                        break;
 
                ret = dsc_compute_link_config(intel_dp,
@@ -6557,6 +6559,7 @@ intel_dp_init_connector(struct intel_digital_port *dig_port,
                intel_connector->get_hw_state = intel_ddi_connector_get_hw_state;
        else
                intel_connector->get_hw_state = intel_connector_get_hw_state;
+       intel_connector->sync_state = intel_dp_connector_sync_state;
 
        if (!intel_edp_init_connector(intel_dp, intel_connector)) {
                intel_dp_aux_fini(intel_dp);
index 53aec023ce92fae91e653adf9278b6a81eae3040..b651c990af85f70b17510effdfdba35235dbf51f 100644 (file)
@@ -1355,7 +1355,7 @@ intel_dp_mst_mode_valid_ctx(struct drm_connector *connector,
                return 0;
        }
 
-       if (DISPLAY_VER(dev_priv) >= 10 &&
+       if (HAS_DSC_MST(dev_priv) &&
            drm_dp_sink_supports_dsc(intel_connector->dp.dsc_dpcd)) {
                /*
                 * TBD pass the connector BPC,
index 6927785fd6ff2fed2406a6ca1889cdf455f548e7..b6e539f1342c29ad97f5f46de8b51d9a358375bb 100644 (file)
@@ -1994,6 +1994,7 @@ static void psr_force_hw_tracking_exit(struct intel_dp *intel_dp)
 
 void intel_psr2_program_trans_man_trk_ctl(const struct intel_crtc_state *crtc_state)
 {
+       struct intel_crtc *crtc = to_intel_crtc(crtc_state->uapi.crtc);
        struct drm_i915_private *dev_priv = to_i915(crtc_state->uapi.crtc->dev);
        enum transcoder cpu_transcoder = crtc_state->cpu_transcoder;
        struct intel_encoder *encoder;
@@ -2013,6 +2014,12 @@ void intel_psr2_program_trans_man_trk_ctl(const struct intel_crtc_state *crtc_st
 
        intel_de_write(dev_priv, PSR2_MAN_TRK_CTL(cpu_transcoder),
                       crtc_state->psr2_man_track_ctl);
+
+       if (!crtc_state->enable_psr2_su_region_et)
+               return;
+
+       intel_de_write(dev_priv, PIPE_SRCSZ_ERLY_TPT(crtc->pipe),
+                      crtc_state->pipe_srcsz_early_tpt);
 }
 
 static void psr2_man_trk_ctl_calc(struct intel_crtc_state *crtc_state,
@@ -2051,6 +2058,20 @@ exit:
        crtc_state->psr2_man_track_ctl = val;
 }
 
+static u32 psr2_pipe_srcsz_early_tpt_calc(struct intel_crtc_state *crtc_state,
+                                         bool full_update)
+{
+       int width, height;
+
+       if (!crtc_state->enable_psr2_su_region_et || full_update)
+               return 0;
+
+       width = drm_rect_width(&crtc_state->psr2_su_area);
+       height = drm_rect_height(&crtc_state->psr2_su_area);
+
+       return PIPESRC_WIDTH(width - 1) | PIPESRC_HEIGHT(height - 1);
+}
+
 static void clip_area_update(struct drm_rect *overlap_damage_area,
                             struct drm_rect *damage_area,
                             struct drm_rect *pipe_src)
@@ -2095,21 +2116,36 @@ static void intel_psr2_sel_fetch_pipe_alignment(struct intel_crtc_state *crtc_st
  * cursor fully when cursor is in SU area.
  */
 static void
-intel_psr2_sel_fetch_et_alignment(struct intel_crtc_state *crtc_state,
-                                 struct intel_plane_state *cursor_state)
+intel_psr2_sel_fetch_et_alignment(struct intel_atomic_state *state,
+                                 struct intel_crtc *crtc)
 {
-       struct drm_rect inter;
+       struct intel_crtc_state *crtc_state = intel_atomic_get_new_crtc_state(state, crtc);
+       struct intel_plane_state *new_plane_state;
+       struct intel_plane *plane;
+       int i;
 
-       if (!crtc_state->enable_psr2_su_region_et ||
-           !cursor_state->uapi.visible)
+       if (!crtc_state->enable_psr2_su_region_et)
                return;
 
-       inter = crtc_state->psr2_su_area;
-       if (!drm_rect_intersect(&inter, &cursor_state->uapi.dst))
-               return;
+       for_each_new_intel_plane_in_state(state, plane, new_plane_state, i) {
+               struct drm_rect inter;
 
-       clip_area_update(&crtc_state->psr2_su_area, &cursor_state->uapi.dst,
-                        &crtc_state->pipe_src);
+               if (new_plane_state->uapi.crtc != crtc_state->uapi.crtc)
+                       continue;
+
+               if (plane->id != PLANE_CURSOR)
+                       continue;
+
+               if (!new_plane_state->uapi.visible)
+                       continue;
+
+               inter = crtc_state->psr2_su_area;
+               if (!drm_rect_intersect(&inter, &new_plane_state->uapi.dst))
+                       continue;
+
+               clip_area_update(&crtc_state->psr2_su_area, &new_plane_state->uapi.dst,
+                                &crtc_state->pipe_src);
+       }
 }
 
 /*
@@ -2152,8 +2188,7 @@ int intel_psr2_sel_fetch_update(struct intel_atomic_state *state,
 {
        struct drm_i915_private *dev_priv = to_i915(state->base.dev);
        struct intel_crtc_state *crtc_state = intel_atomic_get_new_crtc_state(state, crtc);
-       struct intel_plane_state *new_plane_state, *old_plane_state,
-               *cursor_plane_state = NULL;
+       struct intel_plane_state *new_plane_state, *old_plane_state;
        struct intel_plane *plane;
        bool full_update = false;
        int i, ret;
@@ -2238,13 +2273,6 @@ int intel_psr2_sel_fetch_update(struct intel_atomic_state *state,
                damaged_area.x2 += new_plane_state->uapi.dst.x1 - src.x1;
 
                clip_area_update(&crtc_state->psr2_su_area, &damaged_area, &crtc_state->pipe_src);
-
-               /*
-                * Cursor plane new state is stored to adjust su area to cover
-                * cursor are fully.
-                */
-               if (plane->id == PLANE_CURSOR)
-                       cursor_plane_state = new_plane_state;
        }
 
        /*
@@ -2273,9 +2301,13 @@ int intel_psr2_sel_fetch_update(struct intel_atomic_state *state,
        if (ret)
                return ret;
 
-       /* Adjust su area to cover cursor fully as necessary */
-       if (cursor_plane_state)
-               intel_psr2_sel_fetch_et_alignment(crtc_state, cursor_plane_state);
+       /*
+        * Adjust su area to cover cursor fully as necessary (early
+        * transport). This needs to be done after
+        * drm_atomic_add_affected_planes to ensure visible cursor is added into
+        * affected planes even when cursor is not updated by itself.
+        */
+       intel_psr2_sel_fetch_et_alignment(state, crtc);
 
        intel_psr2_sel_fetch_pipe_alignment(crtc_state);
 
@@ -2338,6 +2370,8 @@ int intel_psr2_sel_fetch_update(struct intel_atomic_state *state,
 
 skip_sel_fetch_set_loop:
        psr2_man_trk_ctl_calc(crtc_state, full_update);
+       crtc_state->pipe_srcsz_early_tpt =
+               psr2_pipe_srcsz_early_tpt_calc(crtc_state, full_update);
        return 0;
 }
 
index fa46d2308b0ed3b0d6bd5054f7ffbf4f5701128a..81bf2216371be6a5e16fe15a1bc23ef6c0b5b46c 100644 (file)
@@ -961,6 +961,9 @@ static int gen8_init_rsvd(struct i915_address_space *vm)
        struct i915_vma *vma;
        int ret;
 
+       if (!intel_gt_needs_wa_16018031267(vm->gt))
+               return 0;
+
        /* The memory will be used only by GPU. */
        obj = i915_gem_object_create_lmem(i915, PAGE_SIZE,
                                          I915_BO_ALLOC_VOLATILE |
index 1ade568ffbfa43409129228881abe60d965e8d10..7a6dc371c384eb3d1f2639d5a767072a3bc554f4 100644 (file)
@@ -908,6 +908,23 @@ static intel_engine_mask_t init_engine_mask(struct intel_gt *gt)
                info->engine_mask &= ~BIT(GSC0);
        }
 
+       /*
+        * Do not create the command streamer for CCS slices beyond the first.
+        * All the workload submitted to the first engine will be shared among
+        * all the slices.
+        *
+        * Once the user will be allowed to customize the CCS mode, then this
+        * check needs to be removed.
+        */
+       if (IS_DG2(gt->i915)) {
+               u8 first_ccs = __ffs(CCS_MASK(gt));
+
+               /* Mask off all the CCS engine */
+               info->engine_mask &= ~GENMASK(CCS3, CCS0);
+               /* Put back in the first CCS engine */
+               info->engine_mask |= BIT(_CCS(first_ccs));
+       }
+
        return info->engine_mask;
 }
 
index a425db5ed3a22c38af996ce2183d6fa030ed60b2..6a2c2718bcc38e645903031ce00cd667c1ee5411 100644 (file)
@@ -1024,6 +1024,12 @@ enum i915_map_type intel_gt_coherent_map_type(struct intel_gt *gt,
                return I915_MAP_WC;
 }
 
+bool intel_gt_needs_wa_16018031267(struct intel_gt *gt)
+{
+       /* Wa_16018031267, Wa_16018063123 */
+       return IS_GFX_GT_IP_RANGE(gt, IP_VER(12, 55), IP_VER(12, 71));
+}
+
 bool intel_gt_needs_wa_22016122933(struct intel_gt *gt)
 {
        return MEDIA_VER_FULL(gt->i915) == IP_VER(13, 0) && gt->type == GT_MEDIA;
index 608f5c87292857c6b2777bbd809c5bd87a48238c..003eb93b826fd06fa122650b99b7cdbb09fe3161 100644 (file)
@@ -82,17 +82,18 @@ struct drm_printer;
                  ##__VA_ARGS__);                                       \
 } while (0)
 
-#define NEEDS_FASTCOLOR_BLT_WABB(engine) ( \
-       IS_GFX_GT_IP_RANGE(engine->gt, IP_VER(12, 55), IP_VER(12, 71)) && \
-       engine->class == COPY_ENGINE_CLASS && engine->instance == 0)
-
 static inline bool gt_is_root(struct intel_gt *gt)
 {
        return !gt->info.id;
 }
 
+bool intel_gt_needs_wa_16018031267(struct intel_gt *gt);
 bool intel_gt_needs_wa_22016122933(struct intel_gt *gt);
 
+#define NEEDS_FASTCOLOR_BLT_WABB(engine) ( \
+       intel_gt_needs_wa_16018031267(engine->gt) && \
+       engine->class == COPY_ENGINE_CLASS && engine->instance == 0)
+
 static inline struct intel_gt *uc_to_gt(struct intel_uc *uc)
 {
        return container_of(uc, struct intel_gt, uc);
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
new file mode 100644 (file)
index 0000000..044219c
--- /dev/null
@@ -0,0 +1,39 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright Â© 2024 Intel Corporation
+ */
+
+#include "i915_drv.h"
+#include "intel_gt.h"
+#include "intel_gt_ccs_mode.h"
+#include "intel_gt_regs.h"
+
+void intel_gt_apply_ccs_mode(struct intel_gt *gt)
+{
+       int cslice;
+       u32 mode = 0;
+       int first_ccs = __ffs(CCS_MASK(gt));
+
+       if (!IS_DG2(gt->i915))
+               return;
+
+       /* Build the value for the fixed CCS load balancing */
+       for (cslice = 0; cslice < I915_MAX_CCS; cslice++) {
+               if (CCS_MASK(gt) & BIT(cslice))
+                       /*
+                        * If available, assign the cslice
+                        * to the first available engine...
+                        */
+                       mode |= XEHP_CCS_MODE_CSLICE(cslice, first_ccs);
+
+               else
+                       /*
+                        * ... otherwise, mark the cslice as
+                        * unavailable if no CCS dispatches here
+                        */
+                       mode |= XEHP_CCS_MODE_CSLICE(cslice,
+                                                    XEHP_CCS_MODE_CSLICE_MASK);
+       }
+
+       intel_uncore_write(gt->uncore, XEHP_CCS_MODE, mode);
+}
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
new file mode 100644 (file)
index 0000000..9e5549c
--- /dev/null
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright Â© 2024 Intel Corporation
+ */
+
+#ifndef __INTEL_GT_CCS_MODE_H__
+#define __INTEL_GT_CCS_MODE_H__
+
+struct intel_gt;
+
+void intel_gt_apply_ccs_mode(struct intel_gt *gt);
+
+#endif /* __INTEL_GT_CCS_MODE_H__ */
index 50962cfd1353ae4673b27a9bb2437d47633b5651..743fe35667227451436205f9e44514df1c4e809b 100644 (file)
 #define   ECOBITS_PPGTT_CACHE4B                        (0 << 8)
 
 #define GEN12_RCU_MODE                         _MMIO(0x14800)
+#define   XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE   REG_BIT(1)
 #define   GEN12_RCU_MODE_CCS_ENABLE            REG_BIT(0)
 
+#define XEHP_CCS_MODE                          _MMIO(0x14804)
+#define   XEHP_CCS_MODE_CSLICE_MASK            REG_GENMASK(2, 0) /* CCS0-3 + rsvd */
+#define   XEHP_CCS_MODE_CSLICE_WIDTH           ilog2(XEHP_CCS_MODE_CSLICE_MASK + 1)
+#define   XEHP_CCS_MODE_CSLICE(cslice, ccs)    (ccs << (cslice * XEHP_CCS_MODE_CSLICE_WIDTH))
+
 #define CHV_FUSE_GT                            _MMIO(VLV_GUNIT_BASE + 0x2168)
 #define   CHV_FGT_DISABLE_SS0                  (1 << 10)
 #define   CHV_FGT_DISABLE_SS1                  (1 << 11)
index 25413809b9dc99734409210259a9f51f1fffad88..6ec3582c97357780f823865cf0a9a9581b50d288 100644 (file)
@@ -10,6 +10,7 @@
 #include "intel_engine_regs.h"
 #include "intel_gpu_commands.h"
 #include "intel_gt.h"
+#include "intel_gt_ccs_mode.h"
 #include "intel_gt_mcr.h"
 #include "intel_gt_print.h"
 #include "intel_gt_regs.h"
@@ -51,7 +52,8 @@
  *   registers belonging to BCS, VCS or VECS should be implemented in
  *   xcs_engine_wa_init(). Workarounds for registers not belonging to a specific
  *   engine's MMIO range but that are part of of the common RCS/CCS reset domain
- *   should be implemented in general_render_compute_wa_init().
+ *   should be implemented in general_render_compute_wa_init(). The settings
+ *   about the CCS load balancing should be added in ccs_engine_wa_mode().
  *
  * - GT workarounds: the list of these WAs is applied whenever these registers
  *   revert to their default values: on GPU reset, suspend/resume [1]_, etc.
@@ -2854,6 +2856,28 @@ add_render_compute_tuning_settings(struct intel_gt *gt,
                wa_write_clr(wal, GEN8_GARBCNTL, GEN12_BUS_HASH_CTL_BIT_EXC);
 }
 
+static void ccs_engine_wa_mode(struct intel_engine_cs *engine, struct i915_wa_list *wal)
+{
+       struct intel_gt *gt = engine->gt;
+
+       if (!IS_DG2(gt->i915))
+               return;
+
+       /*
+        * Wa_14019159160: This workaround, along with others, leads to
+        * significant challenges in utilizing load balancing among the
+        * CCS slices. Consequently, an architectural decision has been
+        * made to completely disable automatic CCS load balancing.
+        */
+       wa_masked_en(wal, GEN12_RCU_MODE, XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE);
+
+       /*
+        * After having disabled automatic load balancing we need to
+        * assign all slices to a single CCS. We will call it CCS mode 1
+        */
+       intel_gt_apply_ccs_mode(gt);
+}
+
 /*
  * The workarounds in this function apply to shared registers in
  * the general render reset domain that aren't tied to a
@@ -3004,8 +3028,10 @@ engine_init_workarounds(struct intel_engine_cs *engine, struct i915_wa_list *wal
         * to a single RCS/CCS engine's workaround list since
         * they're reset as part of the general render domain reset.
         */
-       if (engine->flags & I915_ENGINE_FIRST_RENDER_COMPUTE)
+       if (engine->flags & I915_ENGINE_FIRST_RENDER_COMPUTE) {
                general_render_compute_wa_init(engine, wal);
+               ccs_engine_wa_mode(engine, wal);
+       }
 
        if (engine->class == COMPUTE_CLASS)
                ccs_engine_wa_init(engine, wal);
index 0a0a11dc9ec03eeba855f47ca57c1ad1c5669f54..ee02cd833c5e4345abdc3fb83968769999ac4340 100644 (file)
@@ -812,15 +812,15 @@ op_remap(struct drm_gpuva_op_remap *r,
        struct drm_gpuva_op_unmap *u = r->unmap;
        struct nouveau_uvma *uvma = uvma_from_va(u->va);
        u64 addr = uvma->va.va.addr;
-       u64 range = uvma->va.va.range;
+       u64 end = uvma->va.va.addr + uvma->va.va.range;
 
        if (r->prev)
                addr = r->prev->va.addr + r->prev->va.range;
 
        if (r->next)
-               range = r->next->va.addr - addr;
+               end = r->next->va.addr;
 
-       op_unmap_range(u, addr, range);
+       op_unmap_range(u, addr, end - addr);
 }
 
 static int
index 986e8d547c94246a5f7bd058e6ddf555ffc651a4..060c74a80eb14b916db3c441e44b137dd15b7336 100644 (file)
@@ -420,7 +420,7 @@ gf100_gr_chan_new(struct nvkm_gr *base, struct nvkm_chan *fifoch,
                        return ret;
        } else {
                ret = nvkm_memory_map(gr->attrib_cb, 0, chan->vmm, chan->attrib_cb,
-                                     &args, sizeof(args));;
+                                     &args, sizeof(args));
                if (ret)
                        return ret;
        }
index 7bcbc4895ec22196acecfd46d0b29490d2c93ee2..271bfa038f5bc90974acd1ed2709d5cbae51ed94 100644 (file)
@@ -25,6 +25,7 @@
 
 #include <subdev/bios.h>
 #include <subdev/bios/init.h>
+#include <subdev/gsp.h>
 
 void
 gm107_devinit_disable(struct nvkm_devinit *init)
@@ -33,10 +34,13 @@ gm107_devinit_disable(struct nvkm_devinit *init)
        u32 r021c00 = nvkm_rd32(device, 0x021c00);
        u32 r021c04 = nvkm_rd32(device, 0x021c04);
 
-       if (r021c00 & 0x00000001)
-               nvkm_subdev_disable(device, NVKM_ENGINE_CE, 0);
-       if (r021c00 & 0x00000004)
-               nvkm_subdev_disable(device, NVKM_ENGINE_CE, 2);
+       /* gsp only wants to enable/disable display */
+       if (!nvkm_gsp_rm(device->gsp)) {
+               if (r021c00 & 0x00000001)
+                       nvkm_subdev_disable(device, NVKM_ENGINE_CE, 0);
+               if (r021c00 & 0x00000004)
+                       nvkm_subdev_disable(device, NVKM_ENGINE_CE, 2);
+       }
        if (r021c04 & 0x00000001)
                nvkm_subdev_disable(device, NVKM_ENGINE_DISP, 0);
 }
index 11b4c9c274a1a597cb3592019d873345c241d1cd..666eb93b1742ca5435cf0567e28e1664122bad8b 100644 (file)
@@ -41,6 +41,7 @@ r535_devinit_new(const struct nvkm_devinit_func *hw,
 
        rm->dtor = r535_devinit_dtor;
        rm->post = hw->post;
+       rm->disable = hw->disable;
 
        ret = nv50_devinit_new_(rm, device, type, inst, pdevinit);
        if (ret)
index 9063ce2546422fd93eb0c0b847cab68aac0ee753..fd8e44992184fa2e63a11e1810ea8e79f9e929a4 100644 (file)
@@ -441,19 +441,19 @@ void panfrost_gpu_power_off(struct panfrost_device *pfdev)
 
        gpu_write(pfdev, SHADER_PWROFF_LO, pfdev->features.shader_present);
        ret = readl_relaxed_poll_timeout(pfdev->iomem + SHADER_PWRTRANS_LO,
-                                        val, !val, 1, 1000);
+                                        val, !val, 1, 2000);
        if (ret)
                dev_err(pfdev->dev, "shader power transition timeout");
 
        gpu_write(pfdev, TILER_PWROFF_LO, pfdev->features.tiler_present);
        ret = readl_relaxed_poll_timeout(pfdev->iomem + TILER_PWRTRANS_LO,
-                                        val, !val, 1, 1000);
+                                        val, !val, 1, 2000);
        if (ret)
                dev_err(pfdev->dev, "tiler power transition timeout");
 
        gpu_write(pfdev, L2_PWROFF_LO, pfdev->features.l2_present);
        ret = readl_poll_timeout(pfdev->iomem + L2_PWRTRANS_LO,
-                                val, !val, 0, 1000);
+                                val, !val, 0, 2000);
        if (ret)
                dev_err(pfdev->dev, "l2 power transition timeout");
 }
index ca85e81fdb44383ffdafdb48a98a843cb1884b71..d32ff3857e65838d460d507440d576601fa02f03 100644 (file)
@@ -193,6 +193,9 @@ static void xe_device_destroy(struct drm_device *dev, void *dummy)
 {
        struct xe_device *xe = to_xe_device(dev);
 
+       if (xe->preempt_fence_wq)
+               destroy_workqueue(xe->preempt_fence_wq);
+
        if (xe->ordered_wq)
                destroy_workqueue(xe->ordered_wq);
 
@@ -258,9 +261,15 @@ struct xe_device *xe_device_create(struct pci_dev *pdev,
        INIT_LIST_HEAD(&xe->pinned.external_vram);
        INIT_LIST_HEAD(&xe->pinned.evicted);
 
+       xe->preempt_fence_wq = alloc_ordered_workqueue("xe-preempt-fence-wq", 0);
        xe->ordered_wq = alloc_ordered_workqueue("xe-ordered-wq", 0);
        xe->unordered_wq = alloc_workqueue("xe-unordered-wq", 0, 0);
-       if (!xe->ordered_wq || !xe->unordered_wq) {
+       if (!xe->ordered_wq || !xe->unordered_wq ||
+           !xe->preempt_fence_wq) {
+               /*
+                * Cleanup done in xe_device_destroy via
+                * drmm_add_action_or_reset register above
+                */
                drm_err(&xe->drm, "Failed to allocate xe workqueues\n");
                err = -ENOMEM;
                goto err;
index 9785eef2e5a4e6566c452e1fa8c45c447fe00b76..8e3a222b41cf0a4dda7286b10566e6def0d97ad4 100644 (file)
@@ -363,6 +363,9 @@ struct xe_device {
        /** @ufence_wq: user fence wait queue */
        wait_queue_head_t ufence_wq;
 
+       /** @preempt_fence_wq: used to serialize preempt fences */
+       struct workqueue_struct *preempt_fence_wq;
+
        /** @ordered_wq: used to serialize compute mode resume */
        struct workqueue_struct *ordered_wq;
 
index 826c8b389672502dfebd6e89c6c1997bf8f0c9a2..cc5e0f75de3c7350770323aeea9570ddd89d48bb 100644 (file)
  *     Unlock all
  */
 
+/*
+ * Add validation and rebinding to the drm_exec locking loop, since both can
+ * trigger eviction which may require sleeping dma_resv locks.
+ */
 static int xe_exec_fn(struct drm_gpuvm_exec *vm_exec)
 {
        struct xe_vm *vm = container_of(vm_exec->vm, struct xe_vm, gpuvm);
-       struct drm_gem_object *obj;
-       unsigned long index;
-       int num_fences;
-       int ret;
-
-       ret = drm_gpuvm_validate(vm_exec->vm, &vm_exec->exec);
-       if (ret)
-               return ret;
-
-       /*
-        * 1 fence slot for the final submit, and 1 more for every per-tile for
-        * GPU bind and 1 extra for CPU bind. Note that there are potentially
-        * many vma per object/dma-resv, however the fence slot will just be
-        * re-used, since they are largely the same timeline and the seqno
-        * should be in order. In the case of CPU bind there is dummy fence used
-        * for all CPU binds, so no need to have a per-tile slot for that.
-        */
-       num_fences = 1 + 1 + vm->xe->info.tile_count;
 
-       /*
-        * We don't know upfront exactly how many fence slots we will need at
-        * the start of the exec, since the TTM bo_validate above can consume
-        * numerous fence slots. Also due to how the dma_resv_reserve_fences()
-        * works it only ensures that at least that many fence slots are
-        * available i.e if there are already 10 slots available and we reserve
-        * two more, it can just noop without reserving anything.  With this it
-        * is quite possible that TTM steals some of the fence slots and then
-        * when it comes time to do the vma binding and final exec stage we are
-        * lacking enough fence slots, leading to some nasty BUG_ON() when
-        * adding the fences. Hence just add our own fences here, after the
-        * validate stage.
-        */
-       drm_exec_for_each_locked_object(&vm_exec->exec, index, obj) {
-               ret = dma_resv_reserve_fences(obj->resv, num_fences);
-               if (ret)
-                       return ret;
-       }
-
-       return 0;
+       /* The fence slot added here is intended for the exec sched job. */
+       return xe_vm_validate_rebind(vm, &vm_exec->exec, 1);
 }
 
 int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
@@ -152,7 +120,6 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
        struct drm_exec *exec = &vm_exec.exec;
        u32 i, num_syncs = 0, num_ufence = 0;
        struct xe_sched_job *job;
-       struct dma_fence *rebind_fence;
        struct xe_vm *vm;
        bool write_locked, skip_retry = false;
        ktime_t end = 0;
@@ -290,39 +257,7 @@ retry:
                goto err_exec;
        }
 
-       /*
-        * Rebind any invalidated userptr or evicted BOs in the VM, non-compute
-        * VM mode only.
-        */
-       rebind_fence = xe_vm_rebind(vm, false);
-       if (IS_ERR(rebind_fence)) {
-               err = PTR_ERR(rebind_fence);
-               goto err_put_job;
-       }
-
-       /*
-        * We store the rebind_fence in the VM so subsequent execs don't get
-        * scheduled before the rebinds of userptrs / evicted BOs is complete.
-        */
-       if (rebind_fence) {
-               dma_fence_put(vm->rebind_fence);
-               vm->rebind_fence = rebind_fence;
-       }
-       if (vm->rebind_fence) {
-               if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
-                            &vm->rebind_fence->flags)) {
-                       dma_fence_put(vm->rebind_fence);
-                       vm->rebind_fence = NULL;
-               } else {
-                       dma_fence_get(vm->rebind_fence);
-                       err = drm_sched_job_add_dependency(&job->drm,
-                                                          vm->rebind_fence);
-                       if (err)
-                               goto err_put_job;
-               }
-       }
-
-       /* Wait behind munmap style rebinds */
+       /* Wait behind rebinds */
        if (!xe_vm_in_lr_mode(vm)) {
                err = drm_sched_job_add_resv_dependencies(&job->drm,
                                                          xe_vm_resv(vm),
index 62b3d9d1d7cdd4f2d65c55db414a00b7bd7fbd06..462b331950320c0e49901fb09c32a8cdcffc1745 100644 (file)
@@ -148,6 +148,11 @@ struct xe_exec_queue {
        const struct xe_ring_ops *ring_ops;
        /** @entity: DRM sched entity for this exec queue (1 to 1 relationship) */
        struct drm_sched_entity *entity;
+       /**
+        * @tlb_flush_seqno: The seqno of the last rebind tlb flush performed
+        * Protected by @vm's resv. Unused if @vm == NULL.
+        */
+       u64 tlb_flush_seqno;
        /** @lrc: logical ring context for this exec queue */
        struct xe_lrc lrc[];
 };
index 241c294270d9167f25d1898f8f590c7aabb06ca0..fa9e9853c53ba605e0e35870bed69e7d09d25934 100644 (file)
@@ -100,10 +100,9 @@ static int xe_pf_begin(struct drm_exec *exec, struct xe_vma *vma,
 {
        struct xe_bo *bo = xe_vma_bo(vma);
        struct xe_vm *vm = xe_vma_vm(vma);
-       unsigned int num_shared = 2; /* slots for bind + move */
        int err;
 
-       err = xe_vm_prepare_vma(exec, vma, num_shared);
+       err = xe_vm_lock_vma(exec, vma);
        if (err)
                return err;
 
index f03e077f81a04fcb9344f8c634856acab516c6f1..e598a4363d0190504d9ca8d826d7d996f0d2dfaf 100644 (file)
@@ -61,7 +61,6 @@ int xe_gt_tlb_invalidation_init(struct xe_gt *gt)
        INIT_LIST_HEAD(&gt->tlb_invalidation.pending_fences);
        spin_lock_init(&gt->tlb_invalidation.pending_lock);
        spin_lock_init(&gt->tlb_invalidation.lock);
-       gt->tlb_invalidation.fence_context = dma_fence_context_alloc(1);
        INIT_DELAYED_WORK(&gt->tlb_invalidation.fence_tdr,
                          xe_gt_tlb_fence_timeout);
 
index 70c615dd14986599324a2fb68f766889761c7eb1..07b2f724ec45685feaa4b5ab86b6f2011f65198e 100644 (file)
@@ -177,13 +177,6 @@ struct xe_gt {
                 * xe_gt_tlb_fence_timeout after the timeut interval is over.
                 */
                struct delayed_work fence_tdr;
-               /** @tlb_invalidation.fence_context: context for TLB invalidation fences */
-               u64 fence_context;
-               /**
-                * @tlb_invalidation.fence_seqno: seqno to TLB invalidation fences, protected by
-                * tlb_invalidation.lock
-                */
-               u32 fence_seqno;
                /** @tlb_invalidation.lock: protects TLB invalidation fences */
                spinlock_t lock;
        } tlb_invalidation;
index 7bce2a332603c086bf4bed63c212cdff311f6bbf..7d50c6e89d8e7dc0ba718b9439ef86858e1f3992 100644 (file)
@@ -49,7 +49,7 @@ static bool preempt_fence_enable_signaling(struct dma_fence *fence)
        struct xe_exec_queue *q = pfence->q;
 
        pfence->error = q->ops->suspend(q);
-       queue_work(system_unbound_wq, &pfence->preempt_work);
+       queue_work(q->vm->xe->preempt_fence_wq, &pfence->preempt_work);
        return true;
 }
 
index 7f54bc3e389d58f8023f3a1092aa47d3e852a16b..4efc8c1a3d7a99e00107aeb88c803db26cb62881 100644 (file)
@@ -1135,8 +1135,7 @@ static int invalidation_fence_init(struct xe_gt *gt,
        spin_lock_irq(&gt->tlb_invalidation.lock);
        dma_fence_init(&ifence->base.base, &invalidation_fence_ops,
                       &gt->tlb_invalidation.lock,
-                      gt->tlb_invalidation.fence_context,
-                      ++gt->tlb_invalidation.fence_seqno);
+                      dma_fence_context_alloc(1), 1);
        spin_unlock_irq(&gt->tlb_invalidation.lock);
 
        INIT_LIST_HEAD(&ifence->base.link);
@@ -1236,6 +1235,13 @@ __xe_pt_bind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_exec_queue
        err = xe_pt_prepare_bind(tile, vma, entries, &num_entries);
        if (err)
                goto err;
+
+       err = dma_resv_reserve_fences(xe_vm_resv(vm), 1);
+       if (!err && !xe_vma_has_no_bo(vma) && !xe_vma_bo(vma)->vm)
+               err = dma_resv_reserve_fences(xe_vma_bo(vma)->ttm.base.resv, 1);
+       if (err)
+               goto err;
+
        xe_tile_assert(tile, num_entries <= ARRAY_SIZE(entries));
 
        xe_vm_dbg_print_entries(tile_to_xe(tile), entries, num_entries);
@@ -1254,11 +1260,13 @@ __xe_pt_bind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_exec_queue
         * non-faulting LR, in particular on user-space batch buffer chaining,
         * it needs to be done here.
         */
-       if ((rebind && !xe_vm_in_lr_mode(vm) && !vm->batch_invalidate_tlb) ||
-           (!rebind && xe_vm_has_scratch(vm) && xe_vm_in_preempt_fence_mode(vm))) {
+       if ((!rebind && xe_vm_has_scratch(vm) && xe_vm_in_preempt_fence_mode(vm))) {
                ifence = kzalloc(sizeof(*ifence), GFP_KERNEL);
                if (!ifence)
                        return ERR_PTR(-ENOMEM);
+       } else if (rebind && !xe_vm_in_lr_mode(vm)) {
+               /* We bump also if batch_invalidate_tlb is true */
+               vm->tlb_flush_seqno++;
        }
 
        rfence = kzalloc(sizeof(*rfence), GFP_KERNEL);
@@ -1297,7 +1305,7 @@ __xe_pt_bind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_exec_queue
                }
 
                /* add shared fence now for pagetable delayed destroy */
-               dma_resv_add_fence(xe_vm_resv(vm), fence, !rebind &&
+               dma_resv_add_fence(xe_vm_resv(vm), fence, rebind ||
                                   last_munmap_rebind ?
                                   DMA_RESV_USAGE_KERNEL :
                                   DMA_RESV_USAGE_BOOKKEEP);
@@ -1576,6 +1584,7 @@ __xe_pt_unbind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_exec_queu
        struct dma_fence *fence = NULL;
        struct invalidation_fence *ifence;
        struct xe_range_fence *rfence;
+       int err;
 
        LLIST_HEAD(deferred);
 
@@ -1593,6 +1602,12 @@ __xe_pt_unbind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_exec_queu
        xe_pt_calc_rfence_interval(vma, &unbind_pt_update, entries,
                                   num_entries);
 
+       err = dma_resv_reserve_fences(xe_vm_resv(vm), 1);
+       if (!err && !xe_vma_has_no_bo(vma) && !xe_vma_bo(vma)->vm)
+               err = dma_resv_reserve_fences(xe_vma_bo(vma)->ttm.base.resv, 1);
+       if (err)
+               return ERR_PTR(err);
+
        ifence = kzalloc(sizeof(*ifence), GFP_KERNEL);
        if (!ifence)
                return ERR_PTR(-ENOMEM);
index c4edffcd4a320666d576d950ab15dc614545a053..5b2b37b598130ac464a2c344bad52b731e778e28 100644 (file)
@@ -219,10 +219,9 @@ static void __emit_job_gen12_simple(struct xe_sched_job *job, struct xe_lrc *lrc
 {
        u32 dw[MAX_JOB_SIZE_DW], i = 0;
        u32 ppgtt_flag = get_ppgtt_flag(job);
-       struct xe_vm *vm = job->q->vm;
        struct xe_gt *gt = job->q->gt;
 
-       if (vm && vm->batch_invalidate_tlb) {
+       if (job->ring_ops_flush_tlb) {
                dw[i++] = preparser_disable(true);
                i = emit_flush_imm_ggtt(xe_lrc_start_seqno_ggtt_addr(lrc),
                                        seqno, true, dw, i);
@@ -270,7 +269,6 @@ static void __emit_job_gen12_video(struct xe_sched_job *job, struct xe_lrc *lrc,
        struct xe_gt *gt = job->q->gt;
        struct xe_device *xe = gt_to_xe(gt);
        bool decode = job->q->class == XE_ENGINE_CLASS_VIDEO_DECODE;
-       struct xe_vm *vm = job->q->vm;
 
        dw[i++] = preparser_disable(true);
 
@@ -282,13 +280,13 @@ static void __emit_job_gen12_video(struct xe_sched_job *job, struct xe_lrc *lrc,
                        i = emit_aux_table_inv(gt, VE0_AUX_INV, dw, i);
        }
 
-       if (vm && vm->batch_invalidate_tlb)
+       if (job->ring_ops_flush_tlb)
                i = emit_flush_imm_ggtt(xe_lrc_start_seqno_ggtt_addr(lrc),
                                        seqno, true, dw, i);
 
        dw[i++] = preparser_disable(false);
 
-       if (!vm || !vm->batch_invalidate_tlb)
+       if (!job->ring_ops_flush_tlb)
                i = emit_store_imm_ggtt(xe_lrc_start_seqno_ggtt_addr(lrc),
                                        seqno, dw, i);
 
@@ -317,7 +315,6 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
        struct xe_gt *gt = job->q->gt;
        struct xe_device *xe = gt_to_xe(gt);
        bool lacks_render = !(gt->info.engine_mask & XE_HW_ENGINE_RCS_MASK);
-       struct xe_vm *vm = job->q->vm;
        u32 mask_flags = 0;
 
        dw[i++] = preparser_disable(true);
@@ -327,7 +324,7 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
                mask_flags = PIPE_CONTROL_3D_ENGINE_FLAGS;
 
        /* See __xe_pt_bind_vma() for a discussion on TLB invalidations. */
-       i = emit_pipe_invalidate(mask_flags, vm && vm->batch_invalidate_tlb, dw, i);
+       i = emit_pipe_invalidate(mask_flags, job->ring_ops_flush_tlb, dw, i);
 
        /* hsdes: 1809175790 */
        if (has_aux_ccs(xe))
index 8151ddafb940756d87dbca45e6d3407354535ce4..b0c7fa4693cfe4a999b93b3878cb72c6150ebcbd 100644 (file)
@@ -250,6 +250,16 @@ bool xe_sched_job_completed(struct xe_sched_job *job)
 
 void xe_sched_job_arm(struct xe_sched_job *job)
 {
+       struct xe_exec_queue *q = job->q;
+       struct xe_vm *vm = q->vm;
+
+       if (vm && !xe_sched_job_is_migration(q) && !xe_vm_in_lr_mode(vm) &&
+           (vm->batch_invalidate_tlb || vm->tlb_flush_seqno != q->tlb_flush_seqno)) {
+               xe_vm_assert_held(vm);
+               q->tlb_flush_seqno = vm->tlb_flush_seqno;
+               job->ring_ops_flush_tlb = true;
+       }
+
        drm_sched_job_arm(&job->drm);
 }
 
index b1d83da50a53da59b6d72af1bbd21c8d98ca3517..5e12724219fdd485f2b770bd4b31e78aa2ab42af 100644 (file)
@@ -39,6 +39,8 @@ struct xe_sched_job {
        } user_fence;
        /** @migrate_flush_flags: Additional flush flags for migration jobs */
        u32 migrate_flush_flags;
+       /** @ring_ops_flush_tlb: The ring ops need to flush TLB before payload. */
+       bool ring_ops_flush_tlb;
        /** @batch_addr: batch buffer address of job */
        u64 batch_addr[];
 };
index f88faef4142bde018f336d33d3e2eed726a4bc29..62d1ef8867a84351ae7444d63113d8867dfbb0c5 100644 (file)
@@ -482,17 +482,53 @@ static int xe_gpuvm_validate(struct drm_gpuvm_bo *vm_bo, struct drm_exec *exec)
        return 0;
 }
 
+/**
+ * xe_vm_validate_rebind() - Validate buffer objects and rebind vmas
+ * @vm: The vm for which we are rebinding.
+ * @exec: The struct drm_exec with the locked GEM objects.
+ * @num_fences: The number of fences to reserve for the operation, not
+ * including rebinds and validations.
+ *
+ * Validates all evicted gem objects and rebinds their vmas. Note that
+ * rebindings may cause evictions and hence the validation-rebind
+ * sequence is rerun until there are no more objects to validate.
+ *
+ * Return: 0 on success, negative error code on error. In particular,
+ * may return -EINTR or -ERESTARTSYS if interrupted, and -EDEADLK if
+ * the drm_exec transaction needs to be restarted.
+ */
+int xe_vm_validate_rebind(struct xe_vm *vm, struct drm_exec *exec,
+                         unsigned int num_fences)
+{
+       struct drm_gem_object *obj;
+       unsigned long index;
+       int ret;
+
+       do {
+               ret = drm_gpuvm_validate(&vm->gpuvm, exec);
+               if (ret)
+                       return ret;
+
+               ret = xe_vm_rebind(vm, false);
+               if (ret)
+                       return ret;
+       } while (!list_empty(&vm->gpuvm.evict.list));
+
+       drm_exec_for_each_locked_object(exec, index, obj) {
+               ret = dma_resv_reserve_fences(obj->resv, num_fences);
+               if (ret)
+                       return ret;
+       }
+
+       return 0;
+}
+
 static int xe_preempt_work_begin(struct drm_exec *exec, struct xe_vm *vm,
                                 bool *done)
 {
        int err;
 
-       /*
-        * 1 fence for each preempt fence plus a fence for each tile from a
-        * possible rebind
-        */
-       err = drm_gpuvm_prepare_vm(&vm->gpuvm, exec, vm->preempt.num_exec_queues +
-                                  vm->xe->info.tile_count);
+       err = drm_gpuvm_prepare_vm(&vm->gpuvm, exec, 0);
        if (err)
                return err;
 
@@ -507,7 +543,7 @@ static int xe_preempt_work_begin(struct drm_exec *exec, struct xe_vm *vm,
                return 0;
        }
 
-       err = drm_gpuvm_prepare_objects(&vm->gpuvm, exec, vm->preempt.num_exec_queues);
+       err = drm_gpuvm_prepare_objects(&vm->gpuvm, exec, 0);
        if (err)
                return err;
 
@@ -515,14 +551,19 @@ static int xe_preempt_work_begin(struct drm_exec *exec, struct xe_vm *vm,
        if (err)
                return err;
 
-       return drm_gpuvm_validate(&vm->gpuvm, exec);
+       /*
+        * Add validation and rebinding to the locking loop since both can
+        * cause evictions which may require blocing dma_resv locks.
+        * The fence reservation here is intended for the new preempt fences
+        * we attach at the end of the rebind work.
+        */
+       return xe_vm_validate_rebind(vm, exec, vm->preempt.num_exec_queues);
 }
 
 static void preempt_rebind_work_func(struct work_struct *w)
 {
        struct xe_vm *vm = container_of(w, struct xe_vm, preempt.rebind_work);
        struct drm_exec exec;
-       struct dma_fence *rebind_fence;
        unsigned int fence_count = 0;
        LIST_HEAD(preempt_fences);
        ktime_t end = 0;
@@ -568,18 +609,11 @@ retry:
        if (err)
                goto out_unlock;
 
-       rebind_fence = xe_vm_rebind(vm, true);
-       if (IS_ERR(rebind_fence)) {
-               err = PTR_ERR(rebind_fence);
+       err = xe_vm_rebind(vm, true);
+       if (err)
                goto out_unlock;
-       }
 
-       if (rebind_fence) {
-               dma_fence_wait(rebind_fence, false);
-               dma_fence_put(rebind_fence);
-       }
-
-       /* Wait on munmap style VM unbinds */
+       /* Wait on rebinds and munmap style VM unbinds */
        wait = dma_resv_wait_timeout(xe_vm_resv(vm),
                                     DMA_RESV_USAGE_KERNEL,
                                     false, MAX_SCHEDULE_TIMEOUT);
@@ -773,14 +807,14 @@ xe_vm_bind_vma(struct xe_vma *vma, struct xe_exec_queue *q,
               struct xe_sync_entry *syncs, u32 num_syncs,
               bool first_op, bool last_op);
 
-struct dma_fence *xe_vm_rebind(struct xe_vm *vm, bool rebind_worker)
+int xe_vm_rebind(struct xe_vm *vm, bool rebind_worker)
 {
-       struct dma_fence *fence = NULL;
+       struct dma_fence *fence;
        struct xe_vma *vma, *next;
 
        lockdep_assert_held(&vm->lock);
        if (xe_vm_in_lr_mode(vm) && !rebind_worker)
-               return NULL;
+               return 0;
 
        xe_vm_assert_held(vm);
        list_for_each_entry_safe(vma, next, &vm->rebind_list,
@@ -788,17 +822,17 @@ struct dma_fence *xe_vm_rebind(struct xe_vm *vm, bool rebind_worker)
                xe_assert(vm->xe, vma->tile_present);
 
                list_del_init(&vma->combined_links.rebind);
-               dma_fence_put(fence);
                if (rebind_worker)
                        trace_xe_vma_rebind_worker(vma);
                else
                        trace_xe_vma_rebind_exec(vma);
                fence = xe_vm_bind_vma(vma, NULL, NULL, 0, false, false);
                if (IS_ERR(fence))
-                       return fence;
+                       return PTR_ERR(fence);
+               dma_fence_put(fence);
        }
 
-       return fence;
+       return 0;
 }
 
 static void xe_vma_free(struct xe_vma *vma)
@@ -1004,35 +1038,26 @@ static void xe_vma_destroy(struct xe_vma *vma, struct dma_fence *fence)
 }
 
 /**
- * xe_vm_prepare_vma() - drm_exec utility to lock a vma
+ * xe_vm_lock_vma() - drm_exec utility to lock a vma
  * @exec: The drm_exec object we're currently locking for.
  * @vma: The vma for witch we want to lock the vm resv and any attached
  * object's resv.
- * @num_shared: The number of dma-fence slots to pre-allocate in the
- * objects' reservation objects.
  *
  * Return: 0 on success, negative error code on error. In particular
  * may return -EDEADLK on WW transaction contention and -EINTR if
  * an interruptible wait is terminated by a signal.
  */
-int xe_vm_prepare_vma(struct drm_exec *exec, struct xe_vma *vma,
-                     unsigned int num_shared)
+int xe_vm_lock_vma(struct drm_exec *exec, struct xe_vma *vma)
 {
        struct xe_vm *vm = xe_vma_vm(vma);
        struct xe_bo *bo = xe_vma_bo(vma);
        int err;
 
        XE_WARN_ON(!vm);
-       if (num_shared)
-               err = drm_exec_prepare_obj(exec, xe_vm_obj(vm), num_shared);
-       else
-               err = drm_exec_lock_obj(exec, xe_vm_obj(vm));
-       if (!err && bo && !bo->vm) {
-               if (num_shared)
-                       err = drm_exec_prepare_obj(exec, &bo->ttm.base, num_shared);
-               else
-                       err = drm_exec_lock_obj(exec, &bo->ttm.base);
-       }
+
+       err = drm_exec_lock_obj(exec, xe_vm_obj(vm));
+       if (!err && bo && !bo->vm)
+               err = drm_exec_lock_obj(exec, &bo->ttm.base);
 
        return err;
 }
@@ -1044,7 +1069,7 @@ static void xe_vma_destroy_unlocked(struct xe_vma *vma)
 
        drm_exec_init(&exec, 0, 0);
        drm_exec_until_all_locked(&exec) {
-               err = xe_vm_prepare_vma(&exec, vma, 0);
+               err = xe_vm_lock_vma(&exec, vma);
                drm_exec_retry_on_contention(&exec);
                if (XE_WARN_ON(err))
                        break;
@@ -1589,7 +1614,6 @@ static void vm_destroy_work_func(struct work_struct *w)
                XE_WARN_ON(vm->pt_root[id]);
 
        trace_xe_vm_free(vm);
-       dma_fence_put(vm->rebind_fence);
        kfree(vm);
 }
 
@@ -2512,7 +2536,7 @@ static int op_execute(struct drm_exec *exec, struct xe_vm *vm,
 
        lockdep_assert_held_write(&vm->lock);
 
-       err = xe_vm_prepare_vma(exec, vma, 1);
+       err = xe_vm_lock_vma(exec, vma);
        if (err)
                return err;
 
index 6df1f1c7f85d98a2b948ba41ec9f1ed5a287faf0..306cd0934a190ba0d5580787522c59e762b3b163 100644 (file)
@@ -207,7 +207,7 @@ int __xe_vm_userptr_needs_repin(struct xe_vm *vm);
 
 int xe_vm_userptr_check_repin(struct xe_vm *vm);
 
-struct dma_fence *xe_vm_rebind(struct xe_vm *vm, bool rebind_worker);
+int xe_vm_rebind(struct xe_vm *vm, bool rebind_worker);
 
 int xe_vm_invalidate_vma(struct xe_vma *vma);
 
@@ -242,8 +242,10 @@ bool xe_vm_validate_should_retry(struct drm_exec *exec, int err, ktime_t *end);
 
 int xe_analyze_vm(struct drm_printer *p, struct xe_vm *vm, int gt_id);
 
-int xe_vm_prepare_vma(struct drm_exec *exec, struct xe_vma *vma,
-                     unsigned int num_shared);
+int xe_vm_lock_vma(struct drm_exec *exec, struct xe_vma *vma);
+
+int xe_vm_validate_rebind(struct xe_vm *vm, struct drm_exec *exec,
+                         unsigned int num_fences);
 
 /**
  * xe_vm_resv() - Return's the vm's reservation object
index ae5fb565f6bf48d52e29c811a8333793e4e128fd..badf3945083d56723cc477b3074929a4db316753 100644 (file)
@@ -177,9 +177,6 @@ struct xe_vm {
         */
        struct list_head rebind_list;
 
-       /** @rebind_fence: rebind fence from execbuf */
-       struct dma_fence *rebind_fence;
-
        /**
         * @destroy_work: worker to destroy VM, needed as a dma_fence signaling
         * from an irq context can be last put and the destroy needs to be able
@@ -264,6 +261,11 @@ struct xe_vm {
                bool capture_once;
        } error_capture;
 
+       /**
+        * @tlb_flush_seqno: Required TLB flush seqno for the next exec.
+        * protected by the vm resv.
+        */
+       u64 tlb_flush_seqno;
        /** @batch_invalidate_tlb: Always invalidate TLB before batch start */
        bool batch_invalidate_tlb;
        /** @xef: XE file handle for tracking this VM's drm client */
index 76f79b68cef84548b86def688b6ba95f4aa46335..888ca636f3f3b009ca542747ddca2119a79daa61 100644 (file)
@@ -324,6 +324,7 @@ static void decode_ISR(unsigned int val)
        decode_bits(KERN_DEBUG "ISR", isr_bits, ARRAY_SIZE(isr_bits), val);
 }
 
+#ifdef CONFIG_I2C_PXA_SLAVE
 static const struct bits icr_bits[] = {
        PXA_BIT(ICR_START,  "START",    NULL),
        PXA_BIT(ICR_STOP,   "STOP",     NULL),
@@ -342,7 +343,6 @@ static const struct bits icr_bits[] = {
        PXA_BIT(ICR_UR,     "UR",               "ur"),
 };
 
-#ifdef CONFIG_I2C_PXA_SLAVE
 static void decode_ICR(unsigned int val)
 {
        decode_bits(KERN_DEBUG "ICR", icr_bits, ARRAY_SIZE(icr_bits), val);
index e7a44929f0daf71f017ec8fe0d1b56243ab47ba5..33228c1c8980f32a5e8af323587601a4783b5b7f 100644 (file)
@@ -3228,7 +3228,7 @@ out:
 static void iommu_snp_enable(void)
 {
 #ifdef CONFIG_KVM_AMD_SEV
-       if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+       if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP))
                return;
        /*
         * The SNP support requires that IOMMU must be enabled, and is
@@ -3236,12 +3236,14 @@ static void iommu_snp_enable(void)
         */
        if (no_iommu || iommu_default_passthrough()) {
                pr_err("SNP: IOMMU disabled or configured in passthrough mode, SNP cannot be supported.\n");
+               cc_platform_clear(CC_ATTR_HOST_SEV_SNP);
                return;
        }
 
        amd_iommu_snp_en = check_feature(FEATURE_SNP);
        if (!amd_iommu_snp_en) {
                pr_err("SNP: IOMMU SNP feature not enabled, SNP cannot be supported.\n");
+               cc_platform_clear(CC_ATTR_HOST_SEV_SNP);
                return;
        }
 
index 4c34344dc7dcb876e29d66358bcfcc79e1e77705..d7027d600208fc2f7233c5ca01ab7d590ef33042 100644 (file)
@@ -50,12 +50,12 @@ static void mtk_vcodec_vpu_reset_dec_handler(void *priv)
 
        dev_err(&dev->plat_dev->dev, "Watchdog timeout!!");
 
-       mutex_lock(&dev->dev_mutex);
+       mutex_lock(&dev->dev_ctx_lock);
        list_for_each_entry(ctx, &dev->ctx_list, list) {
                ctx->state = MTK_STATE_ABORT;
                mtk_v4l2_vdec_dbg(0, ctx, "[%d] Change to state MTK_STATE_ABORT", ctx->id);
        }
-       mutex_unlock(&dev->dev_mutex);
+       mutex_unlock(&dev->dev_ctx_lock);
 }
 
 static void mtk_vcodec_vpu_reset_enc_handler(void *priv)
@@ -65,12 +65,12 @@ static void mtk_vcodec_vpu_reset_enc_handler(void *priv)
 
        dev_err(&dev->plat_dev->dev, "Watchdog timeout!!");
 
-       mutex_lock(&dev->dev_mutex);
+       mutex_lock(&dev->dev_ctx_lock);
        list_for_each_entry(ctx, &dev->ctx_list, list) {
                ctx->state = MTK_STATE_ABORT;
                mtk_v4l2_vdec_dbg(0, ctx, "[%d] Change to state MTK_STATE_ABORT", ctx->id);
        }
-       mutex_unlock(&dev->dev_mutex);
+       mutex_unlock(&dev->dev_ctx_lock);
 }
 
 static const struct mtk_vcodec_fw_ops mtk_vcodec_vpu_msg = {
index f47c98faf068b6250de0c46a45efbca641a0e0ad..2073781ccadb156116b1cbe86c49b3e06b7a93f3 100644 (file)
@@ -268,7 +268,9 @@ static int fops_vcodec_open(struct file *file)
 
        ctx->dev->vdec_pdata->init_vdec_params(ctx);
 
+       mutex_lock(&dev->dev_ctx_lock);
        list_add(&ctx->list, &dev->ctx_list);
+       mutex_unlock(&dev->dev_ctx_lock);
        mtk_vcodec_dbgfs_create(ctx);
 
        mutex_unlock(&dev->dev_mutex);
@@ -311,7 +313,9 @@ static int fops_vcodec_release(struct file *file)
        v4l2_ctrl_handler_free(&ctx->ctrl_hdl);
 
        mtk_vcodec_dbgfs_remove(dev, ctx->id);
+       mutex_lock(&dev->dev_ctx_lock);
        list_del_init(&ctx->list);
+       mutex_unlock(&dev->dev_ctx_lock);
        kfree(ctx);
        mutex_unlock(&dev->dev_mutex);
        return 0;
@@ -404,6 +408,7 @@ static int mtk_vcodec_probe(struct platform_device *pdev)
        for (i = 0; i < MTK_VDEC_HW_MAX; i++)
                mutex_init(&dev->dec_mutex[i]);
        mutex_init(&dev->dev_mutex);
+       mutex_init(&dev->dev_ctx_lock);
        spin_lock_init(&dev->irqlock);
 
        snprintf(dev->v4l2_dev.name, sizeof(dev->v4l2_dev.name), "%s",
index 849b89dd205c21d686d7fcfc3624df79f99e4449..85b2c0d3d8bcdd3a59027ddccd1efeb4371292c9 100644 (file)
@@ -241,6 +241,7 @@ struct mtk_vcodec_dec_ctx {
  *
  * @dec_mutex: decoder hardware lock
  * @dev_mutex: video_device lock
+ * @dev_ctx_lock: the lock of context list
  * @decode_workqueue: decode work queue
  *
  * @irqlock: protect data access by irq handler and work thread
@@ -282,6 +283,7 @@ struct mtk_vcodec_dec_dev {
        /* decoder hardware mutex lock */
        struct mutex dec_mutex[MTK_VDEC_HW_MAX];
        struct mutex dev_mutex;
+       struct mutex dev_ctx_lock;
        struct workqueue_struct *decode_workqueue;
 
        spinlock_t irqlock;
index 06ed47df693bfd049fe5537abb6b994c1b740b85..21836dd6ef85a36f4bfc7e781f0a5b57f6c1962d 100644 (file)
@@ -869,7 +869,6 @@ static int vdec_hevc_slice_init(struct mtk_vcodec_dec_ctx *ctx)
        inst->vpu.codec_type = ctx->current_codec;
        inst->vpu.capture_type = ctx->capture_fourcc;
 
-       ctx->drv_handle = inst;
        err = vpu_dec_init(&inst->vpu);
        if (err) {
                mtk_vdec_err(ctx, "vdec_hevc init err=%d", err);
@@ -898,6 +897,7 @@ static int vdec_hevc_slice_init(struct mtk_vcodec_dec_ctx *ctx)
        mtk_vdec_debug(ctx, "lat hevc instance >> %p, codec_type = 0x%x",
                       inst, inst->vpu.codec_type);
 
+       ctx->drv_handle = inst;
        return 0;
 error_free_inst:
        kfree(inst);
index 19407f9bc773c34445613ed8311fb86b1b565d38..987b3d71b662ac98495604e535f6ece7b733b8dd 100644 (file)
@@ -449,7 +449,7 @@ static int vdec_vp8_decode(void *h_vdec, struct mtk_vcodec_mem *bs,
                       inst->frm_cnt, y_fb_dma, c_fb_dma, fb);
 
        inst->cur_fb = fb;
-       dec->bs_dma = (unsigned long)bs->dma_addr;
+       dec->bs_dma = (uint64_t)bs->dma_addr;
        dec->bs_sz = bs->size;
        dec->cur_y_fb_dma = y_fb_dma;
        dec->cur_c_fb_dma = c_fb_dma;
index 55355fa7009083cacba971e0e3f0981e09f80300..039082f600c813f8e703fd283843ee1bddbe31c8 100644 (file)
@@ -16,6 +16,7 @@
 #include "../vdec_drv_base.h"
 #include "../vdec_vpu_if.h"
 
+#define VP9_MAX_SUPER_FRAMES_NUM 8
 #define VP9_SUPER_FRAME_BS_SZ 64
 #define MAX_VP9_DPB_SIZE       9
 
@@ -133,11 +134,11 @@ struct vp9_sf_ref_fb {
  */
 struct vdec_vp9_vsi {
        unsigned char sf_bs_buf[VP9_SUPER_FRAME_BS_SZ];
-       struct vp9_sf_ref_fb sf_ref_fb[VP9_MAX_FRM_BUF_NUM-1];
+       struct vp9_sf_ref_fb sf_ref_fb[VP9_MAX_SUPER_FRAMES_NUM];
        int sf_next_ref_fb_idx;
        unsigned int sf_frm_cnt;
-       unsigned int sf_frm_offset[VP9_MAX_FRM_BUF_NUM-1];
-       unsigned int sf_frm_sz[VP9_MAX_FRM_BUF_NUM-1];
+       unsigned int sf_frm_offset[VP9_MAX_SUPER_FRAMES_NUM];
+       unsigned int sf_frm_sz[VP9_MAX_SUPER_FRAMES_NUM];
        unsigned int sf_frm_idx;
        unsigned int sf_init;
        struct vdec_fb fb;
@@ -526,7 +527,7 @@ static void vp9_swap_frm_bufs(struct vdec_vp9_inst *inst)
        /* if this super frame and it is not last sub-frame, get next fb for
         * sub-frame decode
         */
-       if (vsi->sf_frm_cnt > 0 && vsi->sf_frm_idx != vsi->sf_frm_cnt - 1)
+       if (vsi->sf_frm_cnt > 0 && vsi->sf_frm_idx != vsi->sf_frm_cnt)
                vsi->sf_next_ref_fb_idx = vp9_get_sf_ref_fb(inst);
 }
 
@@ -735,7 +736,7 @@ static void get_free_fb(struct vdec_vp9_inst *inst, struct vdec_fb **out_fb)
 
 static int validate_vsi_array_indexes(struct vdec_vp9_inst *inst,
                struct vdec_vp9_vsi *vsi) {
-       if (vsi->sf_frm_idx >= VP9_MAX_FRM_BUF_NUM - 1) {
+       if (vsi->sf_frm_idx > VP9_MAX_SUPER_FRAMES_NUM) {
                mtk_vdec_err(inst->ctx, "Invalid vsi->sf_frm_idx=%u.", vsi->sf_frm_idx);
                return -EIO;
        }
index cf48d09b78d7a156440e1343448af946342d26e9..eea709d93820919d33d13184af7281fe9f0035fc 100644 (file)
@@ -1074,7 +1074,7 @@ static int vdec_vp9_slice_setup_tile_buffer(struct vdec_vp9_slice_instance *inst
        unsigned int mi_row;
        unsigned int mi_col;
        unsigned int offset;
-       unsigned int pa;
+       dma_addr_t pa;
        unsigned int size;
        struct vdec_vp9_slice_tiles *tiles;
        unsigned char *pos;
@@ -1109,7 +1109,7 @@ static int vdec_vp9_slice_setup_tile_buffer(struct vdec_vp9_slice_instance *inst
        pos = va + offset;
        end = va + bs->size;
        /* truncated */
-       pa = (unsigned int)bs->dma_addr + offset;
+       pa = bs->dma_addr + offset;
        tb = instance->tile.va;
        for (i = 0; i < rows; i++) {
                for (j = 0; j < cols; j++) {
index 82e57ae983d55777463b4d7b08ac6fc18f3ec675..da6be556727bb18a458e1e59235615dc9b42c05f 100644 (file)
@@ -77,12 +77,14 @@ static bool vpu_dec_check_ap_inst(struct mtk_vcodec_dec_dev *dec_dev, struct vde
        struct mtk_vcodec_dec_ctx *ctx;
        int ret = false;
 
+       mutex_lock(&dec_dev->dev_ctx_lock);
        list_for_each_entry(ctx, &dec_dev->ctx_list, list) {
                if (!IS_ERR_OR_NULL(ctx) && ctx->vpu_inst == vpu) {
                        ret = true;
                        break;
                }
        }
+       mutex_unlock(&dec_dev->dev_ctx_lock);
 
        return ret;
 }
index 6319f24bc714b5eb3a7018f1e612afcf2dadf25e..3cb8a16222220e2d5480b48b48879112a68fc11f 100644 (file)
@@ -177,7 +177,9 @@ static int fops_vcodec_open(struct file *file)
        mtk_v4l2_venc_dbg(2, ctx, "Create instance [%d]@%p m2m_ctx=%p ",
                          ctx->id, ctx, ctx->m2m_ctx);
 
+       mutex_lock(&dev->dev_ctx_lock);
        list_add(&ctx->list, &dev->ctx_list);
+       mutex_unlock(&dev->dev_ctx_lock);
 
        mutex_unlock(&dev->dev_mutex);
        mtk_v4l2_venc_dbg(0, ctx, "%s encoder [%d]", dev_name(&dev->plat_dev->dev),
@@ -212,7 +214,9 @@ static int fops_vcodec_release(struct file *file)
        v4l2_fh_exit(&ctx->fh);
        v4l2_ctrl_handler_free(&ctx->ctrl_hdl);
 
+       mutex_lock(&dev->dev_ctx_lock);
        list_del_init(&ctx->list);
+       mutex_unlock(&dev->dev_ctx_lock);
        kfree(ctx);
        mutex_unlock(&dev->dev_mutex);
        return 0;
@@ -294,6 +298,7 @@ static int mtk_vcodec_probe(struct platform_device *pdev)
 
        mutex_init(&dev->enc_mutex);
        mutex_init(&dev->dev_mutex);
+       mutex_init(&dev->dev_ctx_lock);
        spin_lock_init(&dev->irqlock);
 
        snprintf(dev->v4l2_dev.name, sizeof(dev->v4l2_dev.name), "%s",
index a042f607ed8d1645a9dc3cf199b89e4280bc8337..0bd85d0fb379acbba3ac07c01e780cf57bef0305 100644 (file)
@@ -178,6 +178,7 @@ struct mtk_vcodec_enc_ctx {
  *
  * @enc_mutex: encoder hardware lock.
  * @dev_mutex: video_device lock
+ * @dev_ctx_lock: the lock of context list
  * @encode_workqueue: encode work queue
  *
  * @enc_irq: h264 encoder irq resource
@@ -205,6 +206,7 @@ struct mtk_vcodec_enc_dev {
        /* encoder hardware mutex lock */
        struct mutex enc_mutex;
        struct mutex dev_mutex;
+       struct mutex dev_ctx_lock;
        struct workqueue_struct *encode_workqueue;
 
        int enc_irq;
index 84ad1cc6ad171ef2ea2767653d60e6d779e5604e..51bb7ee141b9e58ac98f940f5e419d9ef4df37ca 100644 (file)
@@ -47,12 +47,14 @@ static bool vpu_enc_check_ap_inst(struct mtk_vcodec_enc_dev *enc_dev, struct ven
        struct mtk_vcodec_enc_ctx *ctx;
        int ret = false;
 
+       mutex_lock(&enc_dev->dev_ctx_lock);
        list_for_each_entry(ctx, &enc_dev->ctx_list, list) {
                if (!IS_ERR_OR_NULL(ctx) && ctx->vpu_inst == vpu) {
                        ret = true;
                        break;
                }
        }
+       mutex_unlock(&enc_dev->dev_ctx_lock);
 
        return ret;
 }
index 97a00ec9a4d48944a8233b49c5fa0106493abb47..caacdc0a3819458fbb47faae23432f5950fe5869 100644 (file)
@@ -209,7 +209,7 @@ static void block2mtd_free_device(struct block2mtd_dev *dev)
 
        if (dev->bdev_file) {
                invalidate_mapping_pages(dev->bdev_file->f_mapping, 0, -1);
-               fput(dev->bdev_file);
+               bdev_fput(dev->bdev_file);
        }
 
        kfree(dev);
index 1035820c2377af7d73d401824734e949b7679c8d..c0d0bce0b5942d67a139091f08441983f30dc074 100644 (file)
@@ -950,20 +950,173 @@ static void mt7530_setup_port5(struct dsa_switch *ds, phy_interface_t interface)
        mutex_unlock(&priv->reg_mutex);
 }
 
-/* On page 205, section "8.6.3 Frame filtering" of the active standard, IEEE Std
- * 802.1Qâ„¢-2022, it is stated that frames with 01:80:C2:00:00:00-0F as MAC DA
- * must only be propagated to C-VLAN and MAC Bridge components. That means
- * VLAN-aware and VLAN-unaware bridges. On the switch designs with CPU ports,
- * these frames are supposed to be processed by the CPU (software). So we make
- * the switch only forward them to the CPU port. And if received from a CPU
- * port, forward to a single port. The software is responsible of making the
- * switch conform to the latter by setting a single port as destination port on
- * the special tag.
+/* In Clause 5 of IEEE Std 802-2014, two sublayers of the data link layer (DLL)
+ * of the Open Systems Interconnection basic reference model (OSI/RM) are
+ * described; the medium access control (MAC) and logical link control (LLC)
+ * sublayers. The MAC sublayer is the one facing the physical layer.
  *
- * This switch intellectual property cannot conform to this part of the standard
- * fully. Whilst the REV_UN frame tag covers the remaining :04-0D and :0F MAC
- * DAs, it also includes :22-FF which the scope of propagation is not supposed
- * to be restricted for these MAC DAs.
+ * In 8.2 of IEEE Std 802.1Q-2022, the Bridge architecture is described. A
+ * Bridge component comprises a MAC Relay Entity for interconnecting the Ports
+ * of the Bridge, at least two Ports, and higher layer entities with at least a
+ * Spanning Tree Protocol Entity included.
+ *
+ * Each Bridge Port also functions as an end station and shall provide the MAC
+ * Service to an LLC Entity. Each instance of the MAC Service is provided to a
+ * distinct LLC Entity that supports protocol identification, multiplexing, and
+ * demultiplexing, for protocol data unit (PDU) transmission and reception by
+ * one or more higher layer entities.
+ *
+ * It is described in 8.13.9 of IEEE Std 802.1Q-2022 that in a Bridge, the LLC
+ * Entity associated with each Bridge Port is modeled as being directly
+ * connected to the attached Local Area Network (LAN).
+ *
+ * On the switch with CPU port architecture, CPU port functions as Management
+ * Port, and the Management Port functionality is provided by software which
+ * functions as an end station. Software is connected to an IEEE 802 LAN that is
+ * wholly contained within the system that incorporates the Bridge. Software
+ * provides access to the LLC Entity associated with each Bridge Port by the
+ * value of the source port field on the special tag on the frame received by
+ * software.
+ *
+ * We call frames that carry control information to determine the active
+ * topology and current extent of each Virtual Local Area Network (VLAN), i.e.,
+ * spanning tree or Shortest Path Bridging (SPB) and Multiple VLAN Registration
+ * Protocol Data Units (MVRPDUs), and frames from other link constrained
+ * protocols, such as Extensible Authentication Protocol over LAN (EAPOL) and
+ * Link Layer Discovery Protocol (LLDP), link-local frames. They are not
+ * forwarded by a Bridge. Permanently configured entries in the filtering
+ * database (FDB) ensure that such frames are discarded by the Forwarding
+ * Process. In 8.6.3 of IEEE Std 802.1Q-2022, this is described in detail:
+ *
+ * Each of the reserved MAC addresses specified in Table 8-1
+ * (01-80-C2-00-00-[00,01,02,03,04,05,06,07,08,09,0A,0B,0C,0D,0E,0F]) shall be
+ * permanently configured in the FDB in C-VLAN components and ERs.
+ *
+ * Each of the reserved MAC addresses specified in Table 8-2
+ * (01-80-C2-00-00-[01,02,03,04,05,06,07,08,09,0A,0E]) shall be permanently
+ * configured in the FDB in S-VLAN components.
+ *
+ * Each of the reserved MAC addresses specified in Table 8-3
+ * (01-80-C2-00-00-[01,02,04,0E]) shall be permanently configured in the FDB in
+ * TPMR components.
+ *
+ * The FDB entries for reserved MAC addresses shall specify filtering for all
+ * Bridge Ports and all VIDs. Management shall not provide the capability to
+ * modify or remove entries for reserved MAC addresses.
+ *
+ * The addresses in Table 8-1, Table 8-2, and Table 8-3 determine the scope of
+ * propagation of PDUs within a Bridged Network, as follows:
+ *
+ *   The Nearest Bridge group address (01-80-C2-00-00-0E) is an address that no
+ *   conformant Two-Port MAC Relay (TPMR) component, Service VLAN (S-VLAN)
+ *   component, Customer VLAN (C-VLAN) component, or MAC Bridge can forward.
+ *   PDUs transmitted using this destination address, or any other addresses
+ *   that appear in Table 8-1, Table 8-2, and Table 8-3
+ *   (01-80-C2-00-00-[00,01,02,03,04,05,06,07,08,09,0A,0B,0C,0D,0E,0F]), can
+ *   therefore travel no further than those stations that can be reached via a
+ *   single individual LAN from the originating station.
+ *
+ *   The Nearest non-TPMR Bridge group address (01-80-C2-00-00-03), is an
+ *   address that no conformant S-VLAN component, C-VLAN component, or MAC
+ *   Bridge can forward; however, this address is relayed by a TPMR component.
+ *   PDUs using this destination address, or any of the other addresses that
+ *   appear in both Table 8-1 and Table 8-2 but not in Table 8-3
+ *   (01-80-C2-00-00-[00,03,05,06,07,08,09,0A,0B,0C,0D,0F]), will be relayed by
+ *   any TPMRs but will propagate no further than the nearest S-VLAN component,
+ *   C-VLAN component, or MAC Bridge.
+ *
+ *   The Nearest Customer Bridge group address (01-80-C2-00-00-00) is an address
+ *   that no conformant C-VLAN component, MAC Bridge can forward; however, it is
+ *   relayed by TPMR components and S-VLAN components. PDUs using this
+ *   destination address, or any of the other addresses that appear in Table 8-1
+ *   but not in either Table 8-2 or Table 8-3 (01-80-C2-00-00-[00,0B,0C,0D,0F]),
+ *   will be relayed by TPMR components and S-VLAN components but will propagate
+ *   no further than the nearest C-VLAN component or MAC Bridge.
+ *
+ * Because the LLC Entity associated with each Bridge Port is provided via CPU
+ * port, we must not filter these frames but forward them to CPU port.
+ *
+ * In a Bridge, the transmission Port is majorly decided by ingress and egress
+ * rules, FDB, and spanning tree Port State functions of the Forwarding Process.
+ * For link-local frames, only CPU port should be designated as destination port
+ * in the FDB, and the other functions of the Forwarding Process must not
+ * interfere with the decision of the transmission Port. We call this process
+ * trapping frames to CPU port.
+ *
+ * Therefore, on the switch with CPU port architecture, link-local frames must
+ * be trapped to CPU port, and certain link-local frames received by a Port of a
+ * Bridge comprising a TPMR component or an S-VLAN component must be excluded
+ * from it.
+ *
+ * A Bridge of the switch with CPU port architecture cannot comprise a Two-Port
+ * MAC Relay (TPMR) component as a TPMR component supports only a subset of the
+ * functionality of a MAC Bridge. A Bridge comprising two Ports (Management Port
+ * doesn't count) of this architecture will either function as a standard MAC
+ * Bridge or a standard VLAN Bridge.
+ *
+ * Therefore, a Bridge of this architecture can only comprise S-VLAN components,
+ * C-VLAN components, or MAC Bridge components. Since there's no TPMR component,
+ * we don't need to relay PDUs using the destination addresses specified on the
+ * Nearest non-TPMR section, and the proportion of the Nearest Customer Bridge
+ * section where they must be relayed by TPMR components.
+ *
+ * One option to trap link-local frames to CPU port is to add static FDB entries
+ * with CPU port designated as destination port. However, because that
+ * Independent VLAN Learning (IVL) is being used on every VID, each entry only
+ * applies to a single VLAN Identifier (VID). For a Bridge comprising a MAC
+ * Bridge component or a C-VLAN component, there would have to be 16 times 4096
+ * entries. This switch intellectual property can only hold a maximum of 2048
+ * entries. Using this option, there also isn't a mechanism to prevent
+ * link-local frames from being discarded when the spanning tree Port State of
+ * the reception Port is discarding.
+ *
+ * The remaining option is to utilise the BPC, RGAC1, RGAC2, RGAC3, and RGAC4
+ * registers. Whilst this applies to every VID, it doesn't contain all of the
+ * reserved MAC addresses without affecting the remaining Standard Group MAC
+ * Addresses. The REV_UN frame tag utilised using the RGAC4 register covers the
+ * remaining 01-80-C2-00-00-[04,05,06,07,08,09,0A,0B,0C,0D,0F] destination
+ * addresses. It also includes the 01-80-C2-00-00-22 to 01-80-C2-00-00-FF
+ * destination addresses which may be relayed by MAC Bridges or VLAN Bridges.
+ * The latter option provides better but not complete conformance.
+ *
+ * This switch intellectual property also does not provide a mechanism to trap
+ * link-local frames with specific destination addresses to CPU port by Bridge,
+ * to conform to the filtering rules for the distinct Bridge components.
+ *
+ * Therefore, regardless of the type of the Bridge component, link-local frames
+ * with these destination addresses will be trapped to CPU port:
+ *
+ * 01-80-C2-00-00-[00,01,02,03,0E]
+ *
+ * In a Bridge comprising a MAC Bridge component or a C-VLAN component:
+ *
+ *   Link-local frames with these destination addresses won't be trapped to CPU
+ *   port which won't conform to IEEE Std 802.1Q-2022:
+ *
+ *   01-80-C2-00-00-[04,05,06,07,08,09,0A,0B,0C,0D,0F]
+ *
+ * In a Bridge comprising an S-VLAN component:
+ *
+ *   Link-local frames with these destination addresses will be trapped to CPU
+ *   port which won't conform to IEEE Std 802.1Q-2022:
+ *
+ *   01-80-C2-00-00-00
+ *
+ *   Link-local frames with these destination addresses won't be trapped to CPU
+ *   port which won't conform to IEEE Std 802.1Q-2022:
+ *
+ *   01-80-C2-00-00-[04,05,06,07,08,09,0A]
+ *
+ * To trap link-local frames to CPU port as conformant as this switch
+ * intellectual property can allow, link-local frames are made to be regarded as
+ * Bridge Protocol Data Units (BPDUs). This is because this switch intellectual
+ * property only lets the frames regarded as BPDUs bypass the spanning tree Port
+ * State function of the Forwarding Process.
+ *
+ * The only remaining interference is the ingress rules. When the reception Port
+ * has no PVID assigned on software, VLAN-untagged frames won't be allowed in.
+ * There doesn't seem to be a mechanism on the switch intellectual property to
+ * have link-local frames bypass this function of the Forwarding Process.
  */
 static void
 mt753x_trap_frames(struct mt7530_priv *priv)
@@ -971,35 +1124,43 @@ mt753x_trap_frames(struct mt7530_priv *priv)
        /* Trap 802.1X PAE frames and BPDUs to the CPU port(s) and egress them
         * VLAN-untagged.
         */
-       mt7530_rmw(priv, MT753X_BPC, MT753X_PAE_EG_TAG_MASK |
-                  MT753X_PAE_PORT_FW_MASK | MT753X_BPDU_EG_TAG_MASK |
-                  MT753X_BPDU_PORT_FW_MASK,
-                  MT753X_PAE_EG_TAG(MT7530_VLAN_EG_UNTAGGED) |
-                  MT753X_PAE_PORT_FW(MT753X_BPDU_CPU_ONLY) |
-                  MT753X_BPDU_EG_TAG(MT7530_VLAN_EG_UNTAGGED) |
-                  MT753X_BPDU_CPU_ONLY);
+       mt7530_rmw(priv, MT753X_BPC,
+                  MT753X_PAE_BPDU_FR | MT753X_PAE_EG_TAG_MASK |
+                          MT753X_PAE_PORT_FW_MASK | MT753X_BPDU_EG_TAG_MASK |
+                          MT753X_BPDU_PORT_FW_MASK,
+                  MT753X_PAE_BPDU_FR |
+                          MT753X_PAE_EG_TAG(MT7530_VLAN_EG_UNTAGGED) |
+                          MT753X_PAE_PORT_FW(MT753X_BPDU_CPU_ONLY) |
+                          MT753X_BPDU_EG_TAG(MT7530_VLAN_EG_UNTAGGED) |
+                          MT753X_BPDU_CPU_ONLY);
 
        /* Trap frames with :01 and :02 MAC DAs to the CPU port(s) and egress
         * them VLAN-untagged.
         */
-       mt7530_rmw(priv, MT753X_RGAC1, MT753X_R02_EG_TAG_MASK |
-                  MT753X_R02_PORT_FW_MASK | MT753X_R01_EG_TAG_MASK |
-                  MT753X_R01_PORT_FW_MASK,
-                  MT753X_R02_EG_TAG(MT7530_VLAN_EG_UNTAGGED) |
-                  MT753X_R02_PORT_FW(MT753X_BPDU_CPU_ONLY) |
-                  MT753X_R01_EG_TAG(MT7530_VLAN_EG_UNTAGGED) |
-                  MT753X_BPDU_CPU_ONLY);
+       mt7530_rmw(priv, MT753X_RGAC1,
+                  MT753X_R02_BPDU_FR | MT753X_R02_EG_TAG_MASK |
+                          MT753X_R02_PORT_FW_MASK | MT753X_R01_BPDU_FR |
+                          MT753X_R01_EG_TAG_MASK | MT753X_R01_PORT_FW_MASK,
+                  MT753X_R02_BPDU_FR |
+                          MT753X_R02_EG_TAG(MT7530_VLAN_EG_UNTAGGED) |
+                          MT753X_R02_PORT_FW(MT753X_BPDU_CPU_ONLY) |
+                          MT753X_R01_BPDU_FR |
+                          MT753X_R01_EG_TAG(MT7530_VLAN_EG_UNTAGGED) |
+                          MT753X_BPDU_CPU_ONLY);
 
        /* Trap frames with :03 and :0E MAC DAs to the CPU port(s) and egress
         * them VLAN-untagged.
         */
-       mt7530_rmw(priv, MT753X_RGAC2, MT753X_R0E_EG_TAG_MASK |
-                  MT753X_R0E_PORT_FW_MASK | MT753X_R03_EG_TAG_MASK |
-                  MT753X_R03_PORT_FW_MASK,
-                  MT753X_R0E_EG_TAG(MT7530_VLAN_EG_UNTAGGED) |
-                  MT753X_R0E_PORT_FW(MT753X_BPDU_CPU_ONLY) |
-                  MT753X_R03_EG_TAG(MT7530_VLAN_EG_UNTAGGED) |
-                  MT753X_BPDU_CPU_ONLY);
+       mt7530_rmw(priv, MT753X_RGAC2,
+                  MT753X_R0E_BPDU_FR | MT753X_R0E_EG_TAG_MASK |
+                          MT753X_R0E_PORT_FW_MASK | MT753X_R03_BPDU_FR |
+                          MT753X_R03_EG_TAG_MASK | MT753X_R03_PORT_FW_MASK,
+                  MT753X_R0E_BPDU_FR |
+                          MT753X_R0E_EG_TAG(MT7530_VLAN_EG_UNTAGGED) |
+                          MT753X_R0E_PORT_FW(MT753X_BPDU_CPU_ONLY) |
+                          MT753X_R03_BPDU_FR |
+                          MT753X_R03_EG_TAG(MT7530_VLAN_EG_UNTAGGED) |
+                          MT753X_BPDU_CPU_ONLY);
 }
 
 static void
@@ -2505,18 +2666,25 @@ mt7531_setup(struct dsa_switch *ds)
        mt7530_rmw(priv, MT7531_GPIO_MODE0, MT7531_GPIO0_MASK,
                   MT7531_GPIO0_INTERRUPT);
 
-       /* Enable PHY core PLL, since phy_device has not yet been created
-        * provided for phy_[read,write]_mmd_indirect is called, we provide
-        * our own mt7531_ind_mmd_phy_[read,write] to complete this
-        * function.
+       /* Enable Energy-Efficient Ethernet (EEE) and PHY core PLL, since
+        * phy_device has not yet been created provided for
+        * phy_[read,write]_mmd_indirect is called, we provide our own
+        * mt7531_ind_mmd_phy_[read,write] to complete this function.
         */
        val = mt7531_ind_c45_phy_read(priv, MT753X_CTRL_PHY_ADDR,
                                      MDIO_MMD_VEND2, CORE_PLL_GROUP4);
-       val |= MT7531_PHY_PLL_BYPASS_MODE;
+       val |= MT7531_RG_SYSPLL_DMY2 | MT7531_PHY_PLL_BYPASS_MODE;
        val &= ~MT7531_PHY_PLL_OFF;
        mt7531_ind_c45_phy_write(priv, MT753X_CTRL_PHY_ADDR, MDIO_MMD_VEND2,
                                 CORE_PLL_GROUP4, val);
 
+       /* Disable EEE advertisement on the switch PHYs. */
+       for (i = MT753X_CTRL_PHY_ADDR;
+            i < MT753X_CTRL_PHY_ADDR + MT7530_NUM_PHYS; i++) {
+               mt7531_ind_c45_phy_write(priv, i, MDIO_MMD_AN, MDIO_AN_EEE_ADV,
+                                        0);
+       }
+
        mt7531_setup_common(ds);
 
        /* Setup VLAN ID 0 for VLAN-unaware bridges */
index d17b318e6ee4882ed8b6f6668eaa57a99a38d184..585db03c054878f85ba69e6546cfc885491d8f4e 100644 (file)
@@ -65,6 +65,7 @@ enum mt753x_id {
 
 /* Registers for BPDU and PAE frame control*/
 #define MT753X_BPC                     0x24
+#define  MT753X_PAE_BPDU_FR            BIT(25)
 #define  MT753X_PAE_EG_TAG_MASK                GENMASK(24, 22)
 #define  MT753X_PAE_EG_TAG(x)          FIELD_PREP(MT753X_PAE_EG_TAG_MASK, x)
 #define  MT753X_PAE_PORT_FW_MASK       GENMASK(18, 16)
@@ -75,20 +76,24 @@ enum mt753x_id {
 
 /* Register for :01 and :02 MAC DA frame control */
 #define MT753X_RGAC1                   0x28
+#define  MT753X_R02_BPDU_FR            BIT(25)
 #define  MT753X_R02_EG_TAG_MASK                GENMASK(24, 22)
 #define  MT753X_R02_EG_TAG(x)          FIELD_PREP(MT753X_R02_EG_TAG_MASK, x)
 #define  MT753X_R02_PORT_FW_MASK       GENMASK(18, 16)
 #define  MT753X_R02_PORT_FW(x)         FIELD_PREP(MT753X_R02_PORT_FW_MASK, x)
+#define  MT753X_R01_BPDU_FR            BIT(9)
 #define  MT753X_R01_EG_TAG_MASK                GENMASK(8, 6)
 #define  MT753X_R01_EG_TAG(x)          FIELD_PREP(MT753X_R01_EG_TAG_MASK, x)
 #define  MT753X_R01_PORT_FW_MASK       GENMASK(2, 0)
 
 /* Register for :03 and :0E MAC DA frame control */
 #define MT753X_RGAC2                   0x2c
+#define  MT753X_R0E_BPDU_FR            BIT(25)
 #define  MT753X_R0E_EG_TAG_MASK                GENMASK(24, 22)
 #define  MT753X_R0E_EG_TAG(x)          FIELD_PREP(MT753X_R0E_EG_TAG_MASK, x)
 #define  MT753X_R0E_PORT_FW_MASK       GENMASK(18, 16)
 #define  MT753X_R0E_PORT_FW(x)         FIELD_PREP(MT753X_R0E_PORT_FW_MASK, x)
+#define  MT753X_R03_BPDU_FR            BIT(9)
 #define  MT753X_R03_EG_TAG_MASK                GENMASK(8, 6)
 #define  MT753X_R03_EG_TAG(x)          FIELD_PREP(MT753X_R03_EG_TAG_MASK, x)
 #define  MT753X_R03_PORT_FW_MASK       GENMASK(2, 0)
@@ -616,6 +621,7 @@ enum mt7531_clk_skew {
 #define  RG_SYSPLL_DDSFBK_EN           BIT(12)
 #define  RG_SYSPLL_BIAS_EN             BIT(11)
 #define  RG_SYSPLL_BIAS_LPF_EN         BIT(10)
+#define  MT7531_RG_SYSPLL_DMY2         BIT(6)
 #define  MT7531_PHY_PLL_OFF            BIT(5)
 #define  MT7531_PHY_PLL_BYPASS_MODE    BIT(4)
 
index 9e9e4a03f1a8c9bd4c8d68cb20340157c4547e80..2d8a66ea82fab7f0a023ab469ccc33321d0b4ba3 100644 (file)
@@ -351,7 +351,7 @@ static int ena_com_init_io_sq(struct ena_com_dev *ena_dev,
                        ENA_COM_BOUNCE_BUFFER_CNTRL_CNT;
                io_sq->bounce_buf_ctrl.next_to_use = 0;
 
-               size = io_sq->bounce_buf_ctrl.buffer_size *
+               size = (size_t)io_sq->bounce_buf_ctrl.buffer_size *
                        io_sq->bounce_buf_ctrl.buffers_num;
 
                dev_node = dev_to_node(ena_dev->dmadev);
index 09e7da1a69c9f0c8141e03be445c2589e9d3a999..be5acfa41ee0ce4d80605e0bcdc6dc743c421f42 100644 (file)
@@ -718,8 +718,11 @@ void ena_unmap_tx_buff(struct ena_ring *tx_ring,
 static void ena_free_tx_bufs(struct ena_ring *tx_ring)
 {
        bool print_once = true;
+       bool is_xdp_ring;
        u32 i;
 
+       is_xdp_ring = ENA_IS_XDP_INDEX(tx_ring->adapter, tx_ring->qid);
+
        for (i = 0; i < tx_ring->ring_size; i++) {
                struct ena_tx_buffer *tx_info = &tx_ring->tx_buffer_info[i];
 
@@ -739,10 +742,15 @@ static void ena_free_tx_bufs(struct ena_ring *tx_ring)
 
                ena_unmap_tx_buff(tx_ring, tx_info);
 
-               dev_kfree_skb_any(tx_info->skb);
+               if (is_xdp_ring)
+                       xdp_return_frame(tx_info->xdpf);
+               else
+                       dev_kfree_skb_any(tx_info->skb);
        }
-       netdev_tx_reset_queue(netdev_get_tx_queue(tx_ring->netdev,
-                                                 tx_ring->qid));
+
+       if (!is_xdp_ring)
+               netdev_tx_reset_queue(netdev_get_tx_queue(tx_ring->netdev,
+                                                         tx_ring->qid));
 }
 
 static void ena_free_all_tx_bufs(struct ena_adapter *adapter)
@@ -3481,10 +3489,11 @@ static void check_for_missing_completions(struct ena_adapter *adapter)
 {
        struct ena_ring *tx_ring;
        struct ena_ring *rx_ring;
-       int i, budget, rc;
+       int qid, budget, rc;
        int io_queue_count;
 
        io_queue_count = adapter->xdp_num_queues + adapter->num_io_queues;
+
        /* Make sure the driver doesn't turn the device in other process */
        smp_rmb();
 
@@ -3497,27 +3506,29 @@ static void check_for_missing_completions(struct ena_adapter *adapter)
        if (adapter->missing_tx_completion_to == ENA_HW_HINTS_NO_TIMEOUT)
                return;
 
-       budget = ENA_MONITORED_TX_QUEUES;
+       budget = min_t(u32, io_queue_count, ENA_MONITORED_TX_QUEUES);
 
-       for (i = adapter->last_monitored_tx_qid; i < io_queue_count; i++) {
-               tx_ring = &adapter->tx_ring[i];
-               rx_ring = &adapter->rx_ring[i];
+       qid = adapter->last_monitored_tx_qid;
+
+       while (budget) {
+               qid = (qid + 1) % io_queue_count;
+
+               tx_ring = &adapter->tx_ring[qid];
+               rx_ring = &adapter->rx_ring[qid];
 
                rc = check_missing_comp_in_tx_queue(adapter, tx_ring);
                if (unlikely(rc))
                        return;
 
-               rc =  !ENA_IS_XDP_INDEX(adapter, i) ?
+               rc =  !ENA_IS_XDP_INDEX(adapter, qid) ?
                        check_for_rx_interrupt_queue(adapter, rx_ring) : 0;
                if (unlikely(rc))
                        return;
 
                budget--;
-               if (!budget)
-                       break;
        }
 
-       adapter->last_monitored_tx_qid = i % io_queue_count;
+       adapter->last_monitored_tx_qid = qid;
 }
 
 /* trigger napi schedule after 2 consecutive detections */
index 337c435d3ce998b1b8f69a86f8be7997e1ff99c8..5b175e7e92a10ba19917b9c5e63d89bc1f2a8dd5 100644 (file)
@@ -89,7 +89,7 @@ int ena_xdp_xmit_frame(struct ena_ring *tx_ring,
 
        rc = ena_xdp_tx_map_frame(tx_ring, tx_info, xdpf, &ena_tx_ctx);
        if (unlikely(rc))
-               return rc;
+               goto err;
 
        ena_tx_ctx.req_id = req_id;
 
@@ -112,7 +112,9 @@ int ena_xdp_xmit_frame(struct ena_ring *tx_ring,
 
 error_unmap_dma:
        ena_unmap_tx_buff(tx_ring, tx_info);
+err:
        tx_info->xdpf = NULL;
+
        return rc;
 }
 
index 86f1854698b4e80816b1b55e1d4f4d31fdf0737f..883c044852f1df39852b50b50a80ab31c7bfb091 100644 (file)
@@ -95,9 +95,15 @@ static inline void mlx5e_ptp_metadata_fifo_push(struct mlx5e_ptp_metadata_fifo *
 }
 
 static inline u8
+mlx5e_ptp_metadata_fifo_peek(struct mlx5e_ptp_metadata_fifo *fifo)
+{
+       return fifo->data[fifo->mask & fifo->cc];
+}
+
+static inline void
 mlx5e_ptp_metadata_fifo_pop(struct mlx5e_ptp_metadata_fifo *fifo)
 {
-       return fifo->data[fifo->mask & fifo->cc++];
+       fifo->cc++;
 }
 
 static inline void
index e87e26f2c669c2e39f59a9f656e643fce2b48aae..6743806b8480602a8d0a3d02cddcf2d2c1c82199 100644 (file)
@@ -83,24 +83,25 @@ int mlx5e_open_qos_sq(struct mlx5e_priv *priv, struct mlx5e_channels *chs,
 
        txq_ix = mlx5e_qid_from_qos(chs, node_qid);
 
-       WARN_ON(node_qid > priv->htb_max_qos_sqs);
-       if (node_qid == priv->htb_max_qos_sqs) {
-               struct mlx5e_sq_stats *stats, **stats_list = NULL;
-
-               if (priv->htb_max_qos_sqs == 0) {
-                       stats_list = kvcalloc(mlx5e_qos_max_leaf_nodes(priv->mdev),
-                                             sizeof(*stats_list),
-                                             GFP_KERNEL);
-                       if (!stats_list)
-                               return -ENOMEM;
-               }
+       WARN_ON(node_qid >= mlx5e_htb_cur_leaf_nodes(priv->htb));
+       if (!priv->htb_qos_sq_stats) {
+               struct mlx5e_sq_stats **stats_list;
+
+               stats_list = kvcalloc(mlx5e_qos_max_leaf_nodes(priv->mdev),
+                                     sizeof(*stats_list), GFP_KERNEL);
+               if (!stats_list)
+                       return -ENOMEM;
+
+               WRITE_ONCE(priv->htb_qos_sq_stats, stats_list);
+       }
+
+       if (!priv->htb_qos_sq_stats[node_qid]) {
+               struct mlx5e_sq_stats *stats;
+
                stats = kzalloc(sizeof(*stats), GFP_KERNEL);
-               if (!stats) {
-                       kvfree(stats_list);
+               if (!stats)
                        return -ENOMEM;
-               }
-               if (stats_list)
-                       WRITE_ONCE(priv->htb_qos_sq_stats, stats_list);
+
                WRITE_ONCE(priv->htb_qos_sq_stats[node_qid], stats);
                /* Order htb_max_qos_sqs increment after writing the array pointer.
                 * Pairs with smp_load_acquire in en_stats.c.
index bcafb4bf94154ff01969fc851852d4234f5b0c04..8d9a3b5ec973b39aaa1addc8f6c5e3a568a7ab1a 100644 (file)
@@ -179,6 +179,13 @@ u32 mlx5e_rqt_size(struct mlx5_core_dev *mdev, unsigned int num_channels)
        return min_t(u32, rqt_size, max_cap_rqt_size);
 }
 
+#define MLX5E_MAX_RQT_SIZE_ALLOWED_WITH_XOR8_HASH 256
+
+unsigned int mlx5e_rqt_max_num_channels_allowed_for_xor8(void)
+{
+       return MLX5E_MAX_RQT_SIZE_ALLOWED_WITH_XOR8_HASH / MLX5E_UNIFORM_SPREAD_RQT_FACTOR;
+}
+
 void mlx5e_rqt_destroy(struct mlx5e_rqt *rqt)
 {
        mlx5_core_destroy_rqt(rqt->mdev, rqt->rqtn);
index e0bc30308c77000038d151a7caa206c516a6fe9a..2f9e04a8418f143fbf3d01423721d85b2d5b5a2a 100644 (file)
@@ -38,6 +38,7 @@ static inline u32 mlx5e_rqt_get_rqtn(struct mlx5e_rqt *rqt)
 }
 
 u32 mlx5e_rqt_size(struct mlx5_core_dev *mdev, unsigned int num_channels);
+unsigned int mlx5e_rqt_max_num_channels_allowed_for_xor8(void);
 int mlx5e_rqt_redirect_direct(struct mlx5e_rqt *rqt, u32 rqn, u32 *vhca_id);
 int mlx5e_rqt_redirect_indir(struct mlx5e_rqt *rqt, u32 *rqns, u32 *vhca_ids,
                             unsigned int num_rqns,
index f675b1926340f9ca4218aa47febac7c5139ab0e9..f66bbc8464645efabc08ebf923fada5e1f79c5fe 100644 (file)
@@ -57,6 +57,7 @@ int mlx5e_selq_init(struct mlx5e_selq *selq, struct mutex *state_lock)
 
 void mlx5e_selq_cleanup(struct mlx5e_selq *selq)
 {
+       mutex_lock(selq->state_lock);
        WARN_ON_ONCE(selq->is_prepared);
 
        kvfree(selq->standby);
@@ -67,6 +68,7 @@ void mlx5e_selq_cleanup(struct mlx5e_selq *selq)
 
        kvfree(selq->standby);
        selq->standby = NULL;
+       mutex_unlock(selq->state_lock);
 }
 
 void mlx5e_selq_prepare_params(struct mlx5e_selq *selq, struct mlx5e_params *params)
index cc51ce16df14abe530910e063b9072c9e23ff49c..8f101181648c6a294332b6229946b1afae4ee554 100644 (file)
@@ -451,6 +451,34 @@ int mlx5e_ethtool_set_channels(struct mlx5e_priv *priv,
 
        mutex_lock(&priv->state_lock);
 
+       if (mlx5e_rx_res_get_current_hash(priv->rx_res).hfunc == ETH_RSS_HASH_XOR) {
+               unsigned int xor8_max_channels = mlx5e_rqt_max_num_channels_allowed_for_xor8();
+
+               if (count > xor8_max_channels) {
+                       err = -EINVAL;
+                       netdev_err(priv->netdev, "%s: Requested number of channels (%d) exceeds the maximum allowed by the XOR8 RSS hfunc (%d)\n",
+                                  __func__, count, xor8_max_channels);
+                       goto out;
+               }
+       }
+
+       /* If RXFH is configured, changing the channels number is allowed only if
+        * it does not require resizing the RSS table. This is because the previous
+        * configuration may no longer be compatible with the new RSS table.
+        */
+       if (netif_is_rxfh_configured(priv->netdev)) {
+               int cur_rqt_size = mlx5e_rqt_size(priv->mdev, cur_params->num_channels);
+               int new_rqt_size = mlx5e_rqt_size(priv->mdev, count);
+
+               if (new_rqt_size != cur_rqt_size) {
+                       err = -EINVAL;
+                       netdev_err(priv->netdev,
+                                  "%s: RXFH is configured, block changing channels number that affects RSS table size (new: %d, current: %d)\n",
+                                  __func__, new_rqt_size, cur_rqt_size);
+                       goto out;
+               }
+       }
+
        /* Don't allow changing the number of channels if HTB offload is active,
         * because the numeration of the QoS SQs will change, while per-queue
         * qdiscs are attached.
@@ -1281,17 +1309,30 @@ int mlx5e_set_rxfh(struct net_device *dev, struct ethtool_rxfh_param *rxfh,
        struct mlx5e_priv *priv = netdev_priv(dev);
        u32 *rss_context = &rxfh->rss_context;
        u8 hfunc = rxfh->hfunc;
+       unsigned int count;
        int err;
 
        mutex_lock(&priv->state_lock);
+
+       count = priv->channels.params.num_channels;
+
+       if (hfunc == ETH_RSS_HASH_XOR) {
+               unsigned int xor8_max_channels = mlx5e_rqt_max_num_channels_allowed_for_xor8();
+
+               if (count > xor8_max_channels) {
+                       err = -EINVAL;
+                       netdev_err(priv->netdev, "%s: Cannot set RSS hash function to XOR, current number of channels (%d) exceeds the maximum allowed for XOR8 RSS hfunc (%d)\n",
+                                  __func__, count, xor8_max_channels);
+                       goto unlock;
+               }
+       }
+
        if (*rss_context && rxfh->rss_delete) {
                err = mlx5e_rx_res_rss_destroy(priv->rx_res, *rss_context);
                goto unlock;
        }
 
        if (*rss_context == ETH_RXFH_CONTEXT_ALLOC) {
-               unsigned int count = priv->channels.params.num_channels;
-
                err = mlx5e_rx_res_rss_init(priv->rx_res, rss_context, count);
                if (err)
                        goto unlock;
index 91848eae45655fd57d7dcc3365f8ad5094f61f3d..b375ef268671ab01215e370149d1a3b9cb86b756 100644 (file)
@@ -5726,9 +5726,7 @@ void mlx5e_priv_cleanup(struct mlx5e_priv *priv)
        kfree(priv->tx_rates);
        kfree(priv->txq2sq);
        destroy_workqueue(priv->wq);
-       mutex_lock(&priv->state_lock);
        mlx5e_selq_cleanup(&priv->selq);
-       mutex_unlock(&priv->state_lock);
        free_cpumask_var(priv->scratchpad.cpumask);
 
        for (i = 0; i < priv->htb_max_qos_sqs; i++)
index 2fa076b23fbead06bceb6697e0ebb0238bb5be7e..e21a3b4128ce880478795b023e1ff314e9336dd0 100644 (file)
@@ -398,6 +398,8 @@ mlx5e_txwqe_complete(struct mlx5e_txqsq *sq, struct sk_buff *skb,
                     (skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP))) {
                u8 metadata_index = be32_to_cpu(eseg->flow_table_metadata);
 
+               mlx5e_ptp_metadata_fifo_pop(&sq->ptpsq->metadata_freelist);
+
                mlx5e_skb_cb_hwtstamp_init(skb);
                mlx5e_ptp_metadata_map_put(&sq->ptpsq->metadata_map, skb,
                                           metadata_index);
@@ -496,9 +498,6 @@ mlx5e_sq_xmit_wqe(struct mlx5e_txqsq *sq, struct sk_buff *skb,
 
 err_drop:
        stats->dropped++;
-       if (unlikely(sq->ptpsq && (skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP)))
-               mlx5e_ptp_metadata_fifo_push(&sq->ptpsq->metadata_freelist,
-                                            be32_to_cpu(eseg->flow_table_metadata));
        dev_kfree_skb_any(skb);
        mlx5e_tx_flush(sq);
 }
@@ -657,7 +656,7 @@ static void mlx5e_cqe_ts_id_eseg(struct mlx5e_ptpsq *ptpsq, struct sk_buff *skb,
 {
        if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP))
                eseg->flow_table_metadata =
-                       cpu_to_be32(mlx5e_ptp_metadata_fifo_pop(&ptpsq->metadata_freelist));
+                       cpu_to_be32(mlx5e_ptp_metadata_fifo_peek(&ptpsq->metadata_freelist));
 }
 
 static void mlx5e_txwqe_build_eseg(struct mlx5e_priv *priv, struct mlx5e_txqsq *sq,
index 3047d7015c5256726338904432ce56845c59c39c..1789800faaeb62841387ed69b0a82aab3283bf46 100644 (file)
@@ -1868,6 +1868,7 @@ int mlx5_eswitch_init(struct mlx5_core_dev *dev)
        if (err)
                goto abort;
 
+       dev->priv.eswitch = esw;
        err = esw_offloads_init(esw);
        if (err)
                goto reps_err;
@@ -1892,11 +1893,6 @@ int mlx5_eswitch_init(struct mlx5_core_dev *dev)
                esw->offloads.encap = DEVLINK_ESWITCH_ENCAP_MODE_BASIC;
        else
                esw->offloads.encap = DEVLINK_ESWITCH_ENCAP_MODE_NONE;
-       if (MLX5_ESWITCH_MANAGER(dev) &&
-           mlx5_esw_vport_match_metadata_supported(esw))
-               esw->flags |= MLX5_ESWITCH_VPORT_MATCH_METADATA;
-
-       dev->priv.eswitch = esw;
        BLOCKING_INIT_NOTIFIER_HEAD(&esw->n_head);
 
        esw_info(dev,
@@ -1908,6 +1904,7 @@ int mlx5_eswitch_init(struct mlx5_core_dev *dev)
 
 reps_err:
        mlx5_esw_vports_cleanup(esw);
+       dev->priv.eswitch = NULL;
 abort:
        if (esw->work_queue)
                destroy_workqueue(esw->work_queue);
@@ -1926,7 +1923,6 @@ void mlx5_eswitch_cleanup(struct mlx5_eswitch *esw)
 
        esw_info(esw->dev, "cleanup\n");
 
-       esw->dev->priv.eswitch = NULL;
        destroy_workqueue(esw->work_queue);
        WARN_ON(refcount_read(&esw->qos.refcnt));
        mutex_destroy(&esw->state_lock);
@@ -1937,6 +1933,7 @@ void mlx5_eswitch_cleanup(struct mlx5_eswitch *esw)
        mutex_destroy(&esw->offloads.encap_tbl_lock);
        mutex_destroy(&esw->offloads.decap_tbl_lock);
        esw_offloads_cleanup(esw);
+       esw->dev->priv.eswitch = NULL;
        mlx5_esw_vports_cleanup(esw);
        debugfs_remove_recursive(esw->debugfs_root);
        devl_params_unregister(priv_to_devlink(esw->dev), mlx5_eswitch_params,
index baaae628b0a0f6510e2c350cbab0b6309b32da52..1f60954c12f7257cacf0c7e46539c949ac6eaf7b 100644 (file)
@@ -43,6 +43,7 @@
 #include "rdma.h"
 #include "en.h"
 #include "fs_core.h"
+#include "lib/mlx5.h"
 #include "lib/devcom.h"
 #include "lib/eq.h"
 #include "lib/fs_chains.h"
@@ -2476,6 +2477,10 @@ int esw_offloads_init(struct mlx5_eswitch *esw)
        if (err)
                return err;
 
+       if (MLX5_ESWITCH_MANAGER(esw->dev) &&
+           mlx5_esw_vport_match_metadata_supported(esw))
+               esw->flags |= MLX5_ESWITCH_VPORT_MATCH_METADATA;
+
        err = devl_params_register(priv_to_devlink(esw->dev),
                                   esw_devlink_params,
                                   ARRAY_SIZE(esw_devlink_params));
@@ -3707,6 +3712,12 @@ int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode,
        if (esw_mode_from_devlink(mode, &mlx5_mode))
                return -EINVAL;
 
+       if (mode == DEVLINK_ESWITCH_MODE_SWITCHDEV && mlx5_get_sd(esw->dev)) {
+               NL_SET_ERR_MSG_MOD(extack,
+                                  "Can't change E-Switch mode to switchdev when multi-PF netdev (Socket Direct) is configured.");
+               return -EPERM;
+       }
+
        mlx5_lag_disable_change(esw->dev);
        err = mlx5_esw_try_lock(esw);
        if (err < 0) {
index e6bfa7e4f146caf5b05506beaa6c9aabc6c4f74d..cf085a478e3e4c69ffdd4ee9bb24f0036e27c66d 100644 (file)
@@ -1664,6 +1664,16 @@ static int create_auto_flow_group(struct mlx5_flow_table *ft,
        return err;
 }
 
+static bool mlx5_pkt_reformat_cmp(struct mlx5_pkt_reformat *p1,
+                                 struct mlx5_pkt_reformat *p2)
+{
+       return p1->owner == p2->owner &&
+               (p1->owner == MLX5_FLOW_RESOURCE_OWNER_FW ?
+                p1->id == p2->id :
+                mlx5_fs_dr_action_get_pkt_reformat_id(p1) ==
+                mlx5_fs_dr_action_get_pkt_reformat_id(p2));
+}
+
 static bool mlx5_flow_dests_cmp(struct mlx5_flow_destination *d1,
                                struct mlx5_flow_destination *d2)
 {
@@ -1675,8 +1685,8 @@ static bool mlx5_flow_dests_cmp(struct mlx5_flow_destination *d1,
                     ((d1->vport.flags & MLX5_FLOW_DEST_VPORT_VHCA_ID) ?
                      (d1->vport.vhca_id == d2->vport.vhca_id) : true) &&
                     ((d1->vport.flags & MLX5_FLOW_DEST_VPORT_REFORMAT_ID) ?
-                     (d1->vport.pkt_reformat->id ==
-                      d2->vport.pkt_reformat->id) : true)) ||
+                     mlx5_pkt_reformat_cmp(d1->vport.pkt_reformat,
+                                           d2->vport.pkt_reformat) : true)) ||
                    (d1->type == MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE &&
                     d1->ft == d2->ft) ||
                    (d1->type == MLX5_FLOW_DESTINATION_TYPE_TIR &&
@@ -1808,8 +1818,9 @@ static struct mlx5_flow_handle *add_rule_fg(struct mlx5_flow_group *fg,
        }
        trace_mlx5_fs_set_fte(fte, false);
 
+       /* Link newly added rules into the tree. */
        for (i = 0; i < handle->num_rules; i++) {
-               if (refcount_read(&handle->rule[i]->node.refcount) == 1) {
+               if (!handle->rule[i]->node.parent) {
                        tree_add_node(&handle->rule[i]->node, &fte->node);
                        trace_mlx5_fs_add_rule(handle->rule[i]);
                }
index c2593625c09ad6a9150e03baeda0ae41a1a010be..59806553889e907f7ff2938af2063b1e3b73ca1e 100644 (file)
@@ -1480,6 +1480,14 @@ int mlx5_init_one_devl_locked(struct mlx5_core_dev *dev)
        if (err)
                goto err_register;
 
+       err = mlx5_crdump_enable(dev);
+       if (err)
+               mlx5_core_err(dev, "mlx5_crdump_enable failed with error code %d\n", err);
+
+       err = mlx5_hwmon_dev_register(dev);
+       if (err)
+               mlx5_core_err(dev, "mlx5_hwmon_dev_register failed with error code %d\n", err);
+
        mutex_unlock(&dev->intf_state_mutex);
        return 0;
 
@@ -1505,7 +1513,10 @@ int mlx5_init_one(struct mlx5_core_dev *dev)
        int err;
 
        devl_lock(devlink);
+       devl_register(devlink);
        err = mlx5_init_one_devl_locked(dev);
+       if (err)
+               devl_unregister(devlink);
        devl_unlock(devlink);
        return err;
 }
@@ -1517,6 +1528,8 @@ void mlx5_uninit_one(struct mlx5_core_dev *dev)
        devl_lock(devlink);
        mutex_lock(&dev->intf_state_mutex);
 
+       mlx5_hwmon_dev_unregister(dev);
+       mlx5_crdump_disable(dev);
        mlx5_unregister_device(dev);
 
        if (!test_bit(MLX5_INTERFACE_STATE_UP, &dev->intf_state)) {
@@ -1534,6 +1547,7 @@ void mlx5_uninit_one(struct mlx5_core_dev *dev)
        mlx5_function_teardown(dev, true);
 out:
        mutex_unlock(&dev->intf_state_mutex);
+       devl_unregister(devlink);
        devl_unlock(devlink);
 }
 
@@ -1680,16 +1694,20 @@ int mlx5_init_one_light(struct mlx5_core_dev *dev)
        }
 
        devl_lock(devlink);
+       devl_register(devlink);
+
        err = mlx5_devlink_params_register(priv_to_devlink(dev));
-       devl_unlock(devlink);
        if (err) {
                mlx5_core_warn(dev, "mlx5_devlink_param_reg err = %d\n", err);
                goto query_hca_caps_err;
        }
 
+       devl_unlock(devlink);
        return 0;
 
 query_hca_caps_err:
+       devl_unregister(devlink);
+       devl_unlock(devlink);
        mlx5_function_disable(dev, true);
 out:
        dev->state = MLX5_DEVICE_STATE_INTERNAL_ERROR;
@@ -1702,6 +1720,7 @@ void mlx5_uninit_one_light(struct mlx5_core_dev *dev)
 
        devl_lock(devlink);
        mlx5_devlink_params_unregister(priv_to_devlink(dev));
+       devl_unregister(devlink);
        devl_unlock(devlink);
        if (dev->state != MLX5_DEVICE_STATE_UP)
                return;
@@ -1943,16 +1962,7 @@ static int probe_one(struct pci_dev *pdev, const struct pci_device_id *id)
                goto err_init_one;
        }
 
-       err = mlx5_crdump_enable(dev);
-       if (err)
-               dev_err(&pdev->dev, "mlx5_crdump_enable failed with error code %d\n", err);
-
-       err = mlx5_hwmon_dev_register(dev);
-       if (err)
-               mlx5_core_err(dev, "mlx5_hwmon_dev_register failed with error code %d\n", err);
-
        pci_save_state(pdev);
-       devlink_register(devlink);
        return 0;
 
 err_init_one:
@@ -1973,16 +1983,9 @@ static void remove_one(struct pci_dev *pdev)
        struct devlink *devlink = priv_to_devlink(dev);
 
        set_bit(MLX5_BREAK_FW_WAIT, &dev->intf_state);
-       /* mlx5_drain_fw_reset() and mlx5_drain_health_wq() are using
-        * devlink notify APIs.
-        * Hence, we must drain them before unregistering the devlink.
-        */
        mlx5_drain_fw_reset(dev);
        mlx5_drain_health_wq(dev);
-       devlink_unregister(devlink);
        mlx5_sriov_disable(pdev, false);
-       mlx5_hwmon_dev_unregister(dev);
-       mlx5_crdump_disable(dev);
        mlx5_uninit_one(dev);
        mlx5_pci_close(dev);
        mlx5_mdev_uninit(dev);
index 4dcf995cb1a2042c39938ee2f166a6c3d3e6ef24..6bac8ad70ba60bf9982a110f7e115183858e0497 100644 (file)
@@ -19,6 +19,7 @@
 #define MLX5_IRQ_CTRL_SF_MAX 8
 /* min num of vectors for SFs to be enabled */
 #define MLX5_IRQ_VEC_COMP_BASE_SF 2
+#define MLX5_IRQ_VEC_COMP_BASE 1
 
 #define MLX5_EQ_SHARE_IRQ_MAX_COMP (8)
 #define MLX5_EQ_SHARE_IRQ_MAX_CTRL (UINT_MAX)
@@ -246,6 +247,7 @@ static void irq_set_name(struct mlx5_irq_pool *pool, char *name, int vecidx)
                return;
        }
 
+       vecidx -= MLX5_IRQ_VEC_COMP_BASE;
        snprintf(name, MLX5_MAX_IRQ_NAME, "mlx5_comp%d", vecidx);
 }
 
@@ -585,7 +587,7 @@ struct mlx5_irq *mlx5_irq_request_vector(struct mlx5_core_dev *dev, u16 cpu,
        struct mlx5_irq_table *table = mlx5_irq_table_get(dev);
        struct mlx5_irq_pool *pool = table->pcif_pool;
        struct irq_affinity_desc af_desc;
-       int offset = 1;
+       int offset = MLX5_IRQ_VEC_COMP_BASE;
 
        if (!pool->xa_num_irqs.max)
                offset = 0;
index bc863e1f062e6bd316f6b54f87850e11123bbfea..e3bf8c7e4baa62e336415e495a8a385e1edb0654 100644 (file)
@@ -101,7 +101,6 @@ static void mlx5_sf_dev_remove(struct auxiliary_device *adev)
        devlink = priv_to_devlink(mdev);
        set_bit(MLX5_BREAK_FW_WAIT, &mdev->intf_state);
        mlx5_drain_health_wq(mdev);
-       devlink_unregister(devlink);
        if (mlx5_dev_is_lightweight(mdev))
                mlx5_uninit_one_light(mdev);
        else
index 64f4cc284aea41715abecb1167439efe401951f8..030a5776c937406540645462b5950cd209c37974 100644 (file)
@@ -205,12 +205,11 @@ dr_dump_hex_print(char hex[DR_HEX_SIZE], char *src, u32 size)
 }
 
 static int
-dr_dump_rule_action_mem(struct seq_file *file, const u64 rule_id,
+dr_dump_rule_action_mem(struct seq_file *file, char *buff, const u64 rule_id,
                        struct mlx5dr_rule_action_member *action_mem)
 {
        struct mlx5dr_action *action = action_mem->action;
        const u64 action_id = DR_DBG_PTR_TO_ID(action);
-       char buff[MLX5DR_DEBUG_DUMP_BUFF_LENGTH];
        u64 hit_tbl_ptr, miss_tbl_ptr;
        u32 hit_tbl_id, miss_tbl_id;
        int ret;
@@ -488,10 +487,9 @@ dr_dump_rule_action_mem(struct seq_file *file, const u64 rule_id,
 }
 
 static int
-dr_dump_rule_mem(struct seq_file *file, struct mlx5dr_ste *ste,
+dr_dump_rule_mem(struct seq_file *file, char *buff, struct mlx5dr_ste *ste,
                 bool is_rx, const u64 rule_id, u8 format_ver)
 {
-       char buff[MLX5DR_DEBUG_DUMP_BUFF_LENGTH];
        char hw_ste_dump[DR_HEX_SIZE];
        u32 mem_rec_type;
        int ret;
@@ -522,7 +520,8 @@ dr_dump_rule_mem(struct seq_file *file, struct mlx5dr_ste *ste,
 }
 
 static int
-dr_dump_rule_rx_tx(struct seq_file *file, struct mlx5dr_rule_rx_tx *rule_rx_tx,
+dr_dump_rule_rx_tx(struct seq_file *file, char *buff,
+                  struct mlx5dr_rule_rx_tx *rule_rx_tx,
                   bool is_rx, const u64 rule_id, u8 format_ver)
 {
        struct mlx5dr_ste *ste_arr[DR_RULE_MAX_STES + DR_ACTION_MAX_STES];
@@ -533,7 +532,7 @@ dr_dump_rule_rx_tx(struct seq_file *file, struct mlx5dr_rule_rx_tx *rule_rx_tx,
                return 0;
 
        while (i--) {
-               ret = dr_dump_rule_mem(file, ste_arr[i], is_rx, rule_id,
+               ret = dr_dump_rule_mem(file, buff, ste_arr[i], is_rx, rule_id,
                                       format_ver);
                if (ret < 0)
                        return ret;
@@ -542,7 +541,8 @@ dr_dump_rule_rx_tx(struct seq_file *file, struct mlx5dr_rule_rx_tx *rule_rx_tx,
        return 0;
 }
 
-static int dr_dump_rule(struct seq_file *file, struct mlx5dr_rule *rule)
+static noinline_for_stack int
+dr_dump_rule(struct seq_file *file, struct mlx5dr_rule *rule)
 {
        struct mlx5dr_rule_action_member *action_mem;
        const u64 rule_id = DR_DBG_PTR_TO_ID(rule);
@@ -565,19 +565,19 @@ static int dr_dump_rule(struct seq_file *file, struct mlx5dr_rule *rule)
                return ret;
 
        if (rx->nic_matcher) {
-               ret = dr_dump_rule_rx_tx(file, rx, true, rule_id, format_ver);
+               ret = dr_dump_rule_rx_tx(file, buff, rx, true, rule_id, format_ver);
                if (ret < 0)
                        return ret;
        }
 
        if (tx->nic_matcher) {
-               ret = dr_dump_rule_rx_tx(file, tx, false, rule_id, format_ver);
+               ret = dr_dump_rule_rx_tx(file, buff, tx, false, rule_id, format_ver);
                if (ret < 0)
                        return ret;
        }
 
        list_for_each_entry(action_mem, &rule->rule_actions_list, list) {
-               ret = dr_dump_rule_action_mem(file, rule_id, action_mem);
+               ret = dr_dump_rule_action_mem(file, buff, rule_id, action_mem);
                if (ret < 0)
                        return ret;
        }
@@ -586,10 +586,10 @@ static int dr_dump_rule(struct seq_file *file, struct mlx5dr_rule *rule)
 }
 
 static int
-dr_dump_matcher_mask(struct seq_file *file, struct mlx5dr_match_param *mask,
+dr_dump_matcher_mask(struct seq_file *file, char *buff,
+                    struct mlx5dr_match_param *mask,
                     u8 criteria, const u64 matcher_id)
 {
-       char buff[MLX5DR_DEBUG_DUMP_BUFF_LENGTH];
        char dump[DR_HEX_SIZE];
        int ret;
 
@@ -681,10 +681,10 @@ dr_dump_matcher_mask(struct seq_file *file, struct mlx5dr_match_param *mask,
 }
 
 static int
-dr_dump_matcher_builder(struct seq_file *file, struct mlx5dr_ste_build *builder,
+dr_dump_matcher_builder(struct seq_file *file, char *buff,
+                       struct mlx5dr_ste_build *builder,
                        u32 index, bool is_rx, const u64 matcher_id)
 {
-       char buff[MLX5DR_DEBUG_DUMP_BUFF_LENGTH];
        int ret;
 
        ret = snprintf(buff, MLX5DR_DEBUG_DUMP_BUFF_LENGTH,
@@ -702,11 +702,10 @@ dr_dump_matcher_builder(struct seq_file *file, struct mlx5dr_ste_build *builder,
 }
 
 static int
-dr_dump_matcher_rx_tx(struct seq_file *file, bool is_rx,
+dr_dump_matcher_rx_tx(struct seq_file *file, char *buff, bool is_rx,
                      struct mlx5dr_matcher_rx_tx *matcher_rx_tx,
                      const u64 matcher_id)
 {
-       char buff[MLX5DR_DEBUG_DUMP_BUFF_LENGTH];
        enum dr_dump_rec_type rec_type;
        u64 s_icm_addr, e_icm_addr;
        int i, ret;
@@ -731,7 +730,7 @@ dr_dump_matcher_rx_tx(struct seq_file *file, bool is_rx,
                return ret;
 
        for (i = 0; i < matcher_rx_tx->num_of_builders; i++) {
-               ret = dr_dump_matcher_builder(file,
+               ret = dr_dump_matcher_builder(file, buff,
                                              &matcher_rx_tx->ste_builder[i],
                                              i, is_rx, matcher_id);
                if (ret < 0)
@@ -741,7 +740,7 @@ dr_dump_matcher_rx_tx(struct seq_file *file, bool is_rx,
        return 0;
 }
 
-static int
+static noinline_for_stack int
 dr_dump_matcher(struct seq_file *file, struct mlx5dr_matcher *matcher)
 {
        struct mlx5dr_matcher_rx_tx *rx = &matcher->rx;
@@ -763,19 +762,19 @@ dr_dump_matcher(struct seq_file *file, struct mlx5dr_matcher *matcher)
        if (ret)
                return ret;
 
-       ret = dr_dump_matcher_mask(file, &matcher->mask,
+       ret = dr_dump_matcher_mask(file, buff, &matcher->mask,
                                   matcher->match_criteria, matcher_id);
        if (ret < 0)
                return ret;
 
        if (rx->nic_tbl) {
-               ret = dr_dump_matcher_rx_tx(file, true, rx, matcher_id);
+               ret = dr_dump_matcher_rx_tx(file, buff, true, rx, matcher_id);
                if (ret < 0)
                        return ret;
        }
 
        if (tx->nic_tbl) {
-               ret = dr_dump_matcher_rx_tx(file, false, tx, matcher_id);
+               ret = dr_dump_matcher_rx_tx(file, buff, false, tx, matcher_id);
                if (ret < 0)
                        return ret;
        }
@@ -803,11 +802,10 @@ dr_dump_matcher_all(struct seq_file *file, struct mlx5dr_matcher *matcher)
 }
 
 static int
-dr_dump_table_rx_tx(struct seq_file *file, bool is_rx,
+dr_dump_table_rx_tx(struct seq_file *file, char *buff, bool is_rx,
                    struct mlx5dr_table_rx_tx *table_rx_tx,
                    const u64 table_id)
 {
-       char buff[MLX5DR_DEBUG_DUMP_BUFF_LENGTH];
        enum dr_dump_rec_type rec_type;
        u64 s_icm_addr;
        int ret;
@@ -829,7 +827,8 @@ dr_dump_table_rx_tx(struct seq_file *file, bool is_rx,
        return 0;
 }
 
-static int dr_dump_table(struct seq_file *file, struct mlx5dr_table *table)
+static noinline_for_stack int
+dr_dump_table(struct seq_file *file, struct mlx5dr_table *table)
 {
        struct mlx5dr_table_rx_tx *rx = &table->rx;
        struct mlx5dr_table_rx_tx *tx = &table->tx;
@@ -848,14 +847,14 @@ static int dr_dump_table(struct seq_file *file, struct mlx5dr_table *table)
                return ret;
 
        if (rx->nic_dmn) {
-               ret = dr_dump_table_rx_tx(file, true, rx,
+               ret = dr_dump_table_rx_tx(file, buff, true, rx,
                                          DR_DBG_PTR_TO_ID(table));
                if (ret < 0)
                        return ret;
        }
 
        if (tx->nic_dmn) {
-               ret = dr_dump_table_rx_tx(file, false, tx,
+               ret = dr_dump_table_rx_tx(file, buff, false, tx,
                                          DR_DBG_PTR_TO_ID(table));
                if (ret < 0)
                        return ret;
@@ -881,10 +880,10 @@ static int dr_dump_table_all(struct seq_file *file, struct mlx5dr_table *tbl)
 }
 
 static int
-dr_dump_send_ring(struct seq_file *file, struct mlx5dr_send_ring *ring,
+dr_dump_send_ring(struct seq_file *file, char *buff,
+                 struct mlx5dr_send_ring *ring,
                  const u64 domain_id)
 {
-       char buff[MLX5DR_DEBUG_DUMP_BUFF_LENGTH];
        int ret;
 
        ret = snprintf(buff, MLX5DR_DEBUG_DUMP_BUFF_LENGTH,
@@ -902,13 +901,13 @@ dr_dump_send_ring(struct seq_file *file, struct mlx5dr_send_ring *ring,
        return 0;
 }
 
-static noinline_for_stack int
+static int
 dr_dump_domain_info_flex_parser(struct seq_file *file,
+                               char *buff,
                                const char *flex_parser_name,
                                const u8 flex_parser_value,
                                const u64 domain_id)
 {
-       char buff[MLX5DR_DEBUG_DUMP_BUFF_LENGTH];
        int ret;
 
        ret = snprintf(buff, MLX5DR_DEBUG_DUMP_BUFF_LENGTH,
@@ -925,11 +924,11 @@ dr_dump_domain_info_flex_parser(struct seq_file *file,
        return 0;
 }
 
-static noinline_for_stack int
-dr_dump_domain_info_caps(struct seq_file *file, struct mlx5dr_cmd_caps *caps,
+static int
+dr_dump_domain_info_caps(struct seq_file *file, char *buff,
+                        struct mlx5dr_cmd_caps *caps,
                         const u64 domain_id)
 {
-       char buff[MLX5DR_DEBUG_DUMP_BUFF_LENGTH];
        struct mlx5dr_cmd_vport_cap *vport_caps;
        unsigned long i, vports_num;
        int ret;
@@ -969,34 +968,35 @@ dr_dump_domain_info_caps(struct seq_file *file, struct mlx5dr_cmd_caps *caps,
 }
 
 static int
-dr_dump_domain_info(struct seq_file *file, struct mlx5dr_domain_info *info,
+dr_dump_domain_info(struct seq_file *file, char *buff,
+                   struct mlx5dr_domain_info *info,
                    const u64 domain_id)
 {
        int ret;
 
-       ret = dr_dump_domain_info_caps(file, &info->caps, domain_id);
+       ret = dr_dump_domain_info_caps(file, buff, &info->caps, domain_id);
        if (ret < 0)
                return ret;
 
-       ret = dr_dump_domain_info_flex_parser(file, "icmp_dw0",
+       ret = dr_dump_domain_info_flex_parser(file, buff, "icmp_dw0",
                                              info->caps.flex_parser_id_icmp_dw0,
                                              domain_id);
        if (ret < 0)
                return ret;
 
-       ret = dr_dump_domain_info_flex_parser(file, "icmp_dw1",
+       ret = dr_dump_domain_info_flex_parser(file, buff, "icmp_dw1",
                                              info->caps.flex_parser_id_icmp_dw1,
                                              domain_id);
        if (ret < 0)
                return ret;
 
-       ret = dr_dump_domain_info_flex_parser(file, "icmpv6_dw0",
+       ret = dr_dump_domain_info_flex_parser(file, buff, "icmpv6_dw0",
                                              info->caps.flex_parser_id_icmpv6_dw0,
                                              domain_id);
        if (ret < 0)
                return ret;
 
-       ret = dr_dump_domain_info_flex_parser(file, "icmpv6_dw1",
+       ret = dr_dump_domain_info_flex_parser(file, buff, "icmpv6_dw1",
                                              info->caps.flex_parser_id_icmpv6_dw1,
                                              domain_id);
        if (ret < 0)
@@ -1032,12 +1032,12 @@ dr_dump_domain(struct seq_file *file, struct mlx5dr_domain *dmn)
        if (ret)
                return ret;
 
-       ret = dr_dump_domain_info(file, &dmn->info, domain_id);
+       ret = dr_dump_domain_info(file, buff, &dmn->info, domain_id);
        if (ret < 0)
                return ret;
 
        if (dmn->info.supp_sw_steering) {
-               ret = dr_dump_send_ring(file, dmn->send_ring, domain_id);
+               ret = dr_dump_send_ring(file, buff, dmn->send_ring, domain_id);
                if (ret < 0)
                        return ret;
        }
index 3a1b1a1f5a1951069f9c3e5ee5e3a10c1be55eb6..60dd2fd603a8554f02f5d649e8d290dc074b5d72 100644 (file)
@@ -731,7 +731,7 @@ static int sparx5_port_pcs_low_set(struct sparx5 *sparx5,
        bool sgmii = false, inband_aneg = false;
        int err;
 
-       if (port->conf.inband) {
+       if (conf->inband) {
                if (conf->portmode == PHY_INTERFACE_MODE_SGMII ||
                    conf->portmode == PHY_INTERFACE_MODE_QSGMII)
                        inband_aneg = true; /* Cisco-SGMII in-band-aneg */
@@ -948,7 +948,7 @@ int sparx5_port_pcs_set(struct sparx5 *sparx5,
        if (err)
                return -EINVAL;
 
-       if (port->conf.inband) {
+       if (conf->inband) {
                /* Enable/disable 1G counters in ASM */
                spx5_rmw(ASM_PORT_CFG_CSC_STAT_DIS_SET(high_speed_dev),
                         ASM_PORT_CFG_CSC_STAT_DIS,
index 8a27328eae34b7ef10b134d354e7ac0ccc032b28..0fc5fe564ae50be28bc6568f90d339d840a4b8d1 100644 (file)
@@ -5046,7 +5046,8 @@ static void rtl_remove_one(struct pci_dev *pdev)
 
        cancel_work_sync(&tp->wk.work);
 
-       r8169_remove_leds(tp->leds);
+       if (IS_ENABLED(CONFIG_R8169_LEDS))
+               r8169_remove_leds(tp->leds);
 
        unregister_netdev(tp->dev);
 
index 943d72bdd794ca5e6258cb02841447ca38898251..27281a9a8951dbd53f30a27a14e0ac0be9a35c5d 100644 (file)
@@ -2076,6 +2076,7 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns,
        bool vwc = ns->ctrl->vwc & NVME_CTRL_VWC_PRESENT;
        struct queue_limits lim;
        struct nvme_id_ns_nvm *nvm = NULL;
+       struct nvme_zone_info zi = {};
        struct nvme_id_ns *id;
        sector_t capacity;
        unsigned lbaf;
@@ -2088,9 +2089,10 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns,
        if (id->ncap == 0) {
                /* namespace not allocated or attached */
                info->is_removed = true;
-               ret = -ENODEV;
+               ret = -ENXIO;
                goto out;
        }
+       lbaf = nvme_lbaf_index(id->flbas);
 
        if (ns->ctrl->ctratt & NVME_CTRL_ATTR_ELBAS) {
                ret = nvme_identify_ns_nvm(ns->ctrl, info->nsid, &nvm);
@@ -2098,8 +2100,14 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns,
                        goto out;
        }
 
+       if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) &&
+           ns->head->ids.csi == NVME_CSI_ZNS) {
+               ret = nvme_query_zone_info(ns, lbaf, &zi);
+               if (ret < 0)
+                       goto out;
+       }
+
        blk_mq_freeze_queue(ns->disk->queue);
-       lbaf = nvme_lbaf_index(id->flbas);
        ns->head->lba_shift = id->lbaf[lbaf].ds;
        ns->head->nuse = le64_to_cpu(id->nuse);
        capacity = nvme_lba_to_sect(ns->head, le64_to_cpu(id->nsze));
@@ -2112,13 +2120,8 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns,
                capacity = 0;
        nvme_config_discard(ns, &lim);
        if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) &&
-           ns->head->ids.csi == NVME_CSI_ZNS) {
-               ret = nvme_update_zone_info(ns, lbaf, &lim);
-               if (ret) {
-                       blk_mq_unfreeze_queue(ns->disk->queue);
-                       goto out;
-               }
-       }
+           ns->head->ids.csi == NVME_CSI_ZNS)
+               nvme_update_zone_info(ns, &lim, &zi);
        ret = queue_limits_commit_update(ns->disk->queue, &lim);
        if (ret) {
                blk_mq_unfreeze_queue(ns->disk->queue);
@@ -2201,6 +2204,7 @@ static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_ns_info *info)
        }
 
        if (!ret && nvme_ns_head_multipath(ns->head)) {
+               struct queue_limits *ns_lim = &ns->disk->queue->limits;
                struct queue_limits lim;
 
                blk_mq_freeze_queue(ns->head->disk->queue);
@@ -2212,7 +2216,26 @@ static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_ns_info *info)
                set_disk_ro(ns->head->disk, nvme_ns_is_readonly(ns, info));
                nvme_mpath_revalidate_paths(ns);
 
+               /*
+                * queue_limits mixes values that are the hardware limitations
+                * for bio splitting with what is the device configuration.
+                *
+                * For NVMe the device configuration can change after e.g. a
+                * Format command, and we really want to pick up the new format
+                * value here.  But we must still stack the queue limits to the
+                * least common denominator for multipathing to split the bios
+                * properly.
+                *
+                * To work around this, we explicitly set the device
+                * configuration to those that we just queried, but only stack
+                * the splitting limits in to make sure we still obey possibly
+                * lower limitations of other controllers.
+                */
                lim = queue_limits_start_update(ns->head->disk->queue);
+               lim.logical_block_size = ns_lim->logical_block_size;
+               lim.physical_block_size = ns_lim->physical_block_size;
+               lim.io_min = ns_lim->io_min;
+               lim.io_opt = ns_lim->io_opt;
                queue_limits_stack_bdev(&lim, ns->disk->part0, 0,
                                        ns->head->disk->disk_name);
                ret = queue_limits_commit_update(ns->head->disk->queue, &lim);
index 68a5d971657bb5080f717f5ae1ec5645830aadd5..a5b29e9ad342df82730ba9a06ea28e755d1953ee 100644 (file)
@@ -2428,7 +2428,7 @@ nvme_fc_ctrl_get(struct nvme_fc_ctrl *ctrl)
  * controller. Called after last nvme_put_ctrl() call
  */
 static void
-nvme_fc_nvme_ctrl_freed(struct nvme_ctrl *nctrl)
+nvme_fc_free_ctrl(struct nvme_ctrl *nctrl)
 {
        struct nvme_fc_ctrl *ctrl = to_fc_ctrl(nctrl);
 
@@ -3384,7 +3384,7 @@ static const struct nvme_ctrl_ops nvme_fc_ctrl_ops = {
        .reg_read32             = nvmf_reg_read32,
        .reg_read64             = nvmf_reg_read64,
        .reg_write32            = nvmf_reg_write32,
-       .free_ctrl              = nvme_fc_nvme_ctrl_freed,
+       .free_ctrl              = nvme_fc_free_ctrl,
        .submit_async_event     = nvme_fc_submit_async_event,
        .delete_ctrl            = nvme_fc_delete_ctrl,
        .get_address            = nvmf_get_address,
index 24193fcb8bd584de277606d57738c1aea5a9cb49..d0ed64dc7380e51577bc6ece92db1d0a273905a7 100644 (file)
@@ -1036,10 +1036,18 @@ static inline bool nvme_disk_is_ns_head(struct gendisk *disk)
 }
 #endif /* CONFIG_NVME_MULTIPATH */
 
+struct nvme_zone_info {
+       u64 zone_size;
+       unsigned int max_open_zones;
+       unsigned int max_active_zones;
+};
+
 int nvme_ns_report_zones(struct nvme_ns *ns, sector_t sector,
                unsigned int nr_zones, report_zones_cb cb, void *data);
-int nvme_update_zone_info(struct nvme_ns *ns, unsigned lbaf,
-               struct queue_limits *lim);
+int nvme_query_zone_info(struct nvme_ns *ns, unsigned lbaf,
+               struct nvme_zone_info *zi);
+void nvme_update_zone_info(struct nvme_ns *ns, struct queue_limits *lim,
+               struct nvme_zone_info *zi);
 #ifdef CONFIG_BLK_DEV_ZONED
 blk_status_t nvme_setup_zone_mgmt_send(struct nvme_ns *ns, struct request *req,
                                       struct nvme_command *cmnd,
index 722384bcc765cda778972c8a86345eaaf18a7353..77aa0f440a6d2a5a538ad06e9cae9ae97c409aef 100644 (file)
@@ -35,8 +35,8 @@ static int nvme_set_max_append(struct nvme_ctrl *ctrl)
        return 0;
 }
 
-int nvme_update_zone_info(struct nvme_ns *ns, unsigned lbaf,
-               struct queue_limits *lim)
+int nvme_query_zone_info(struct nvme_ns *ns, unsigned lbaf,
+               struct nvme_zone_info *zi)
 {
        struct nvme_effects_log *log = ns->head->effects;
        struct nvme_command c = { };
@@ -89,27 +89,34 @@ int nvme_update_zone_info(struct nvme_ns *ns, unsigned lbaf,
                goto free_data;
        }
 
-       ns->head->zsze =
-               nvme_lba_to_sect(ns->head, le64_to_cpu(id->lbafe[lbaf].zsze));
-       if (!is_power_of_2(ns->head->zsze)) {
+       zi->zone_size = le64_to_cpu(id->lbafe[lbaf].zsze);
+       if (!is_power_of_2(zi->zone_size)) {
                dev_warn(ns->ctrl->device,
-                       "invalid zone size:%llu for namespace:%u\n",
-                       ns->head->zsze, ns->head->ns_id);
+                       "invalid zone size: %llu for namespace: %u\n",
+                       zi->zone_size, ns->head->ns_id);
                status = -ENODEV;
                goto free_data;
        }
+       zi->max_open_zones = le32_to_cpu(id->mor) + 1;
+       zi->max_active_zones = le32_to_cpu(id->mar) + 1;
 
-       blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, ns->queue);
-       lim->zoned = 1;
-       lim->max_open_zones = le32_to_cpu(id->mor) + 1;
-       lim->max_active_zones = le32_to_cpu(id->mar) + 1;
-       lim->chunk_sectors = ns->head->zsze;
-       lim->max_zone_append_sectors = ns->ctrl->max_zone_append;
 free_data:
        kfree(id);
        return status;
 }
 
+void nvme_update_zone_info(struct nvme_ns *ns, struct queue_limits *lim,
+               struct nvme_zone_info *zi)
+{
+       lim->zoned = 1;
+       lim->max_open_zones = zi->max_open_zones;
+       lim->max_active_zones = zi->max_active_zones;
+       lim->max_zone_append_sectors = ns->ctrl->max_zone_append;
+       lim->chunk_sectors = ns->head->zsze =
+               nvme_lba_to_sect(ns->head, zi->zone_size);
+       blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, ns->queue);
+}
+
 static void *nvme_zns_alloc_report_buffer(struct nvme_ns *ns,
                                          unsigned int nr_zones, size_t *buflen)
 {
index 77a6e817b31596998e4424aa8205f8cfd9219f1d..a2325330bf22145202837aa5cf89d9ec6543ab59 100644 (file)
@@ -1613,6 +1613,11 @@ static struct config_group *nvmet_subsys_make(struct config_group *group,
                return ERR_PTR(-EINVAL);
        }
 
+       if (sysfs_streq(name, nvmet_disc_subsys->subsysnqn)) {
+               pr_err("can't create subsystem using unique discovery NQN\n");
+               return ERR_PTR(-EINVAL);
+       }
+
        subsys = nvmet_subsys_alloc(name, NVME_NQN_NVME);
        if (IS_ERR(subsys))
                return ERR_CAST(subsys);
@@ -2159,7 +2164,49 @@ static const struct config_item_type nvmet_hosts_type = {
 
 static struct config_group nvmet_hosts_group;
 
+static ssize_t nvmet_root_discovery_nqn_show(struct config_item *item,
+                                            char *page)
+{
+       return snprintf(page, PAGE_SIZE, "%s\n", nvmet_disc_subsys->subsysnqn);
+}
+
+static ssize_t nvmet_root_discovery_nqn_store(struct config_item *item,
+               const char *page, size_t count)
+{
+       struct list_head *entry;
+       size_t len;
+
+       len = strcspn(page, "\n");
+       if (!len || len > NVMF_NQN_FIELD_LEN - 1)
+               return -EINVAL;
+
+       down_write(&nvmet_config_sem);
+       list_for_each(entry, &nvmet_subsystems_group.cg_children) {
+               struct config_item *item =
+                       container_of(entry, struct config_item, ci_entry);
+
+               if (!strncmp(config_item_name(item), page, len)) {
+                       pr_err("duplicate NQN %s\n", config_item_name(item));
+                       up_write(&nvmet_config_sem);
+                       return -EINVAL;
+               }
+       }
+       memset(nvmet_disc_subsys->subsysnqn, 0, NVMF_NQN_FIELD_LEN);
+       memcpy(nvmet_disc_subsys->subsysnqn, page, len);
+       up_write(&nvmet_config_sem);
+
+       return len;
+}
+
+CONFIGFS_ATTR(nvmet_root_, discovery_nqn);
+
+static struct configfs_attribute *nvmet_root_attrs[] = {
+       &nvmet_root_attr_discovery_nqn,
+       NULL,
+};
+
 static const struct config_item_type nvmet_root_type = {
+       .ct_attrs               = nvmet_root_attrs,
        .ct_owner               = THIS_MODULE,
 };
 
index 6bbe4df0166ca56949a5f5b14ad90f68305d6f36..8860a3eb71ec891e948a34060f34b4b148553418 100644 (file)
@@ -1541,6 +1541,13 @@ static struct nvmet_subsys *nvmet_find_get_subsys(struct nvmet_port *port,
        }
 
        down_read(&nvmet_config_sem);
+       if (!strncmp(nvmet_disc_subsys->subsysnqn, subsysnqn,
+                               NVMF_NQN_SIZE)) {
+               if (kref_get_unless_zero(&nvmet_disc_subsys->ref)) {
+                       up_read(&nvmet_config_sem);
+                       return nvmet_disc_subsys;
+               }
+       }
        list_for_each_entry(p, &port->subsystems, entry) {
                if (!strncmp(p->subsys->subsysnqn, subsysnqn,
                                NVMF_NQN_SIZE)) {
index fd229f310c931fbfd6c3132185f2b73c135cd633..337ee1cb09ae644bb98bdf5e8da0525ff40230c3 100644 (file)
@@ -1115,16 +1115,21 @@ nvmet_fc_schedule_delete_assoc(struct nvmet_fc_tgt_assoc *assoc)
 }
 
 static bool
-nvmet_fc_assoc_exits(struct nvmet_fc_tgtport *tgtport, u64 association_id)
+nvmet_fc_assoc_exists(struct nvmet_fc_tgtport *tgtport, u64 association_id)
 {
        struct nvmet_fc_tgt_assoc *a;
+       bool found = false;
 
+       rcu_read_lock();
        list_for_each_entry_rcu(a, &tgtport->assoc_list, a_list) {
-               if (association_id == a->association_id)
-                       return true;
+               if (association_id == a->association_id) {
+                       found = true;
+                       break;
+               }
        }
+       rcu_read_unlock();
 
-       return false;
+       return found;
 }
 
 static struct nvmet_fc_tgt_assoc *
@@ -1164,13 +1169,11 @@ nvmet_fc_alloc_target_assoc(struct nvmet_fc_tgtport *tgtport, void *hosthandle)
                ran = ran << BYTES_FOR_QID_SHIFT;
 
                spin_lock_irqsave(&tgtport->lock, flags);
-               rcu_read_lock();
-               if (!nvmet_fc_assoc_exits(tgtport, ran)) {
+               if (!nvmet_fc_assoc_exists(tgtport, ran)) {
                        assoc->association_id = ran;
                        list_add_tail_rcu(&assoc->a_list, &tgtport->assoc_list);
                        done = true;
                }
-               rcu_read_unlock();
                spin_unlock_irqrestore(&tgtport->lock, flags);
        } while (!done);
 
index 3bf27052832f302ac72d366b986629e03db4e900..4d57a4e34105466f8997b210271b231d216cb9b5 100644 (file)
@@ -9,6 +9,7 @@
 
 #define pr_fmt(fmt)    "OF: " fmt
 
+#include <linux/device.h>
 #include <linux/of.h>
 #include <linux/spinlock.h>
 #include <linux/slab.h>
@@ -667,6 +668,17 @@ void of_changeset_destroy(struct of_changeset *ocs)
 {
        struct of_changeset_entry *ce, *cen;
 
+       /*
+        * When a device is deleted, the device links to/from it are also queued
+        * for deletion. Until these device links are freed, the devices
+        * themselves aren't freed. If the device being deleted is due to an
+        * overlay change, this device might be holding a reference to a device
+        * node that will be freed. So, wait until all already pending device
+        * links are deleted before freeing a device node. This ensures we don't
+        * free any device node that has a non-zero reference count.
+        */
+       device_link_wait_removal();
+
        list_for_each_entry_safe_reverse(ce, cen, &ocs->entries, node)
                __of_changeset_entry_destroy(ce);
 }
index 0e8aa974f0f2bb5262dfbc8b03978b6381bfd61e..f58e624953a20f25f058841b70eb9468d6f5d11e 100644 (file)
@@ -16,6 +16,14 @@ ssize_t of_modalias(const struct device_node *np, char *str, ssize_t len)
        ssize_t csize;
        ssize_t tsize;
 
+       /*
+        * Prevent a kernel oops in vsnprintf() -- it only allows passing a
+        * NULL ptr when the length is also 0. Also filter out the negative
+        * lengths...
+        */
+       if ((len > 0 && !str) || len < 0)
+               return -EINVAL;
+
        /* Name & Type */
        /* %p eats all alphanum characters, so %c must be used here */
        csize = snprintf(str, len, "of:N%pOFn%c%s", np, 'T',
index c78a6fd6c57f612221749d44673d47845911231f..b4efdddb2ad91f2c8b6a52f769a1fcff84206811 100644 (file)
@@ -313,6 +313,10 @@ static int riscv_pmu_event_init(struct perf_event *event)
        u64 event_config = 0;
        uint64_t cmask;
 
+       /* driver does not support branch stack sampling */
+       if (has_branch_stack(event))
+               return -EOPNOTSUPP;
+
        hwc->flags = 0;
        mapped_event = rvpmu->event_map(event, &event_config);
        if (mapped_event < 0) {
index 8ea867c2a01a371a64e1eb10327931861d306dc8..62bc24f6dcc7a82cb11361b133d65bab77d90152 100644 (file)
@@ -263,12 +263,6 @@ static int cros_ec_uart_probe(struct serdev_device *serdev)
        if (!ec_dev)
                return -ENOMEM;
 
-       ret = devm_serdev_device_open(dev, serdev);
-       if (ret) {
-               dev_err(dev, "Unable to open UART device");
-               return ret;
-       }
-
        serdev_device_set_drvdata(serdev, ec_dev);
        init_waitqueue_head(&ec_uart->response.wait_queue);
 
@@ -280,14 +274,6 @@ static int cros_ec_uart_probe(struct serdev_device *serdev)
                return ret;
        }
 
-       ret = serdev_device_set_baudrate(serdev, ec_uart->baudrate);
-       if (ret < 0) {
-               dev_err(dev, "Failed to set up host baud rate (%d)", ret);
-               return ret;
-       }
-
-       serdev_device_set_flow_control(serdev, ec_uart->flowcontrol);
-
        /* Initialize ec_dev for cros_ec  */
        ec_dev->phys_name = dev_name(dev);
        ec_dev->dev = dev;
@@ -301,6 +287,20 @@ static int cros_ec_uart_probe(struct serdev_device *serdev)
 
        serdev_device_set_client_ops(serdev, &cros_ec_uart_client_ops);
 
+       ret = devm_serdev_device_open(dev, serdev);
+       if (ret) {
+               dev_err(dev, "Unable to open UART device");
+               return ret;
+       }
+
+       ret = serdev_device_set_baudrate(serdev, ec_uart->baudrate);
+       if (ret < 0) {
+               dev_err(dev, "Failed to set up host baud rate (%d)", ret);
+               return ret;
+       }
+
+       serdev_device_set_flow_control(serdev, ec_uart->flowcontrol);
+
        return cros_ec_register(ec_dev);
 }
 
index ee2e164f86b9c2973e317bfbfe3297571a8cab48..38c932df6446ac5714d225ac0545ff16345a6e27 100644 (file)
@@ -597,6 +597,15 @@ static const struct dmi_system_id acer_quirks[] __initconst = {
                },
                .driver_data = &quirk_acer_predator_v4,
        },
+       {
+               .callback = dmi_matched,
+               .ident = "Acer Predator PH18-71",
+               .matches = {
+                       DMI_MATCH(DMI_SYS_VENDOR, "Acer"),
+                       DMI_MATCH(DMI_PRODUCT_NAME, "Predator PH18-71"),
+               },
+               .driver_data = &quirk_acer_predator_v4,
+       },
        {
                .callback = set_force_caps,
                .ident = "Acer Aspire Switch 10E SW3-016",
index 7457ca2b27a60b7adadcebb251dba45a0e675e97..c7a8276458640adc888f99fee23fcc10b5ddf2e0 100644 (file)
@@ -49,6 +49,8 @@ static const struct acpi_device_id intel_hid_ids[] = {
        {"INTC1076", 0},
        {"INTC1077", 0},
        {"INTC1078", 0},
+       {"INTC107B", 0},
+       {"INTC10CB", 0},
        {"", 0},
 };
 MODULE_DEVICE_TABLE(acpi, intel_hid_ids);
@@ -504,6 +506,7 @@ static void notify_handler(acpi_handle handle, u32 event, void *context)
        struct platform_device *device = context;
        struct intel_hid_priv *priv = dev_get_drvdata(&device->dev);
        unsigned long long ev_index;
+       struct key_entry *ke;
        int err;
 
        /*
@@ -545,11 +548,15 @@ static void notify_handler(acpi_handle handle, u32 event, void *context)
                if (event == 0xc0 || !priv->array)
                        return;
 
-               if (!sparse_keymap_entry_from_scancode(priv->array, event)) {
+               ke = sparse_keymap_entry_from_scancode(priv->array, event);
+               if (!ke) {
                        dev_info(&device->dev, "unknown event 0x%x\n", event);
                        return;
                }
 
+               if (ke->type == KE_IGNORE)
+                       return;
+
 wakeup:
                pm_wakeup_hard_event(&device->dev);
 
index 084c355c86f5fa9050ccb881a7efa6682b538773..79bb2c801daa972a74b96596e7129583c7abb39c 100644 (file)
@@ -136,8 +136,6 @@ static int intel_vbtn_input_setup(struct platform_device *device)
        priv->switches_dev->id.bustype = BUS_HOST;
 
        if (priv->has_switches) {
-               detect_tablet_mode(&device->dev);
-
                ret = input_register_device(priv->switches_dev);
                if (ret)
                        return ret;
@@ -258,9 +256,6 @@ static const struct dmi_system_id dmi_switches_allow_list[] = {
 
 static bool intel_vbtn_has_switches(acpi_handle handle, bool dual_accel)
 {
-       unsigned long long vgbs;
-       acpi_status status;
-
        /* See dual_accel_detect.h for more info */
        if (dual_accel)
                return false;
@@ -268,8 +263,7 @@ static bool intel_vbtn_has_switches(acpi_handle handle, bool dual_accel)
        if (!dmi_check_system(dmi_switches_allow_list))
                return false;
 
-       status = acpi_evaluate_integer(handle, "VGBS", NULL, &vgbs);
-       return ACPI_SUCCESS(status);
+       return acpi_has_method(handle, "VGBS");
 }
 
 static int intel_vbtn_probe(struct platform_device *device)
@@ -316,6 +310,9 @@ static int intel_vbtn_probe(struct platform_device *device)
                if (ACPI_FAILURE(status))
                        dev_err(&device->dev, "Error VBDL failed with ACPI status %d\n", status);
        }
+       // Check switches after buttons since VBDL may have side effects.
+       if (has_switches)
+               detect_tablet_mode(&device->dev);
 
        device_init_wakeup(&device->dev, true);
        /*
index ad3c39e9e9f586d301abd572c83e76d554a5c382..e714ee6298dda8a66637aa918e33c861508ce15e 100644 (file)
@@ -736,7 +736,7 @@ static int acpi_add(struct acpi_device *device)
                default:
                        year = 2019;
                }
-       pr_info("product: %s  year: %d\n", product, year);
+       pr_info("product: %s  year: %d\n", product ?: "unknown", year);
 
        if (year >= 2019)
                battery_limit_use_wmbb = 1;
index 291f14ef67024a35befa2ab2418e69b8c94c8302..77244c9aa60d233dd35316d764158ab6dcc378ae 100644 (file)
@@ -264,6 +264,7 @@ static const struct key_entry toshiba_acpi_keymap[] = {
        { KE_KEY, 0xb32, { KEY_NEXTSONG } },
        { KE_KEY, 0xb33, { KEY_PLAYPAUSE } },
        { KE_KEY, 0xb5a, { KEY_MEDIA } },
+       { KE_IGNORE, 0x0e00, { KEY_RESERVED } }, /* Wake from sleep */
        { KE_IGNORE, 0x1430, { KEY_RESERVED } }, /* Wake from sleep */
        { KE_IGNORE, 0x1501, { KEY_RESERVED } }, /* Output changed */
        { KE_IGNORE, 0x1502, { KEY_RESERVED } }, /* HDMI plugged/unplugged */
@@ -3523,9 +3524,10 @@ static void toshiba_acpi_notify(struct acpi_device *acpi_dev, u32 event)
                                        (dev->kbd_mode == SCI_KBD_MODE_ON) ?
                                        LED_FULL : LED_OFF);
                break;
+       case 0x8e: /* Power button pressed */
+               break;
        case 0x85: /* Unknown */
        case 0x8d: /* Unknown */
-       case 0x8e: /* Unknown */
        case 0x94: /* Unknown */
        case 0x95: /* Unknown */
        default:
index a06f5f2d79329d615807fcc51064705accfdcc63..9c2f0dd42613d43a456974c0fd0018607e2867fe 100644 (file)
@@ -267,10 +267,17 @@ static const struct i2c_device_id tps65132_id[] = {
 };
 MODULE_DEVICE_TABLE(i2c, tps65132_id);
 
+static const struct of_device_id __maybe_unused tps65132_of_match[] = {
+       { .compatible = "ti,tps65132" },
+       {},
+};
+MODULE_DEVICE_TABLE(of, tps65132_of_match);
+
 static struct i2c_driver tps65132_i2c_driver = {
        .driver = {
                .name = "tps65132",
                .probe_type = PROBE_PREFER_ASYNCHRONOUS,
+               .of_match_table = of_match_ptr(tps65132_of_match),
        },
        .probe = tps65132_probe,
        .id_table = tps65132_id,
index affb05521e146f3ec6de7ffaf005517b6a0caba7..2c8e964425dc38ca80fa5009b17b4e9dc29bbf10 100644 (file)
@@ -14,8 +14,6 @@
 #include <linux/err.h>
 #include <linux/ctype.h>
 #include <linux/processor.h>
-#include <linux/dma-mapping.h>
-#include <linux/mm.h>
 
 #include "ism.h"
 
@@ -294,15 +292,13 @@ out:
 static void ism_free_dmb(struct ism_dev *ism, struct ism_dmb *dmb)
 {
        clear_bit(dmb->sba_idx, ism->sba_bitmap);
-       dma_unmap_page(&ism->pdev->dev, dmb->dma_addr, dmb->dmb_len,
-                      DMA_FROM_DEVICE);
-       folio_put(virt_to_folio(dmb->cpu_addr));
+       dma_free_coherent(&ism->pdev->dev, dmb->dmb_len,
+                         dmb->cpu_addr, dmb->dma_addr);
 }
 
 static int ism_alloc_dmb(struct ism_dev *ism, struct ism_dmb *dmb)
 {
        unsigned long bit;
-       int rc;
 
        if (PAGE_ALIGN(dmb->dmb_len) > dma_get_max_seg_size(&ism->pdev->dev))
                return -EINVAL;
@@ -319,30 +315,14 @@ static int ism_alloc_dmb(struct ism_dev *ism, struct ism_dmb *dmb)
            test_and_set_bit(dmb->sba_idx, ism->sba_bitmap))
                return -EINVAL;
 
-       dmb->cpu_addr =
-               folio_address(folio_alloc(GFP_KERNEL | __GFP_NOWARN |
-                                         __GFP_NOMEMALLOC | __GFP_NORETRY,
-                                         get_order(dmb->dmb_len)));
+       dmb->cpu_addr = dma_alloc_coherent(&ism->pdev->dev, dmb->dmb_len,
+                                          &dmb->dma_addr,
+                                          GFP_KERNEL | __GFP_NOWARN |
+                                          __GFP_NOMEMALLOC | __GFP_NORETRY);
+       if (!dmb->cpu_addr)
+               clear_bit(dmb->sba_idx, ism->sba_bitmap);
 
-       if (!dmb->cpu_addr) {
-               rc = -ENOMEM;
-               goto out_bit;
-       }
-       dmb->dma_addr = dma_map_page(&ism->pdev->dev,
-                                    virt_to_page(dmb->cpu_addr), 0,
-                                    dmb->dmb_len, DMA_FROM_DEVICE);
-       if (dma_mapping_error(&ism->pdev->dev, dmb->dma_addr)) {
-               rc = -ENOMEM;
-               goto out_free;
-       }
-
-       return 0;
-
-out_free:
-       kfree(dmb->cpu_addr);
-out_bit:
-       clear_bit(dmb->sba_idx, ism->sba_bitmap);
-       return rc;
+       return dmb->cpu_addr ? 0 : -ENOMEM;
 }
 
 int ism_register_dmb(struct ism_dev *ism, struct ism_dmb *dmb,
index 097dfe4b620dce85736b8a0d5cf7f4b3c4842e9b..35f8e00850d6cb3e45063c2229c4f7532a9eae40 100644 (file)
@@ -1797,7 +1797,7 @@ static int hisi_sas_debug_I_T_nexus_reset(struct domain_device *device)
        if (dev_is_sata(device)) {
                struct ata_link *link = &device->sata_dev.ap->link;
 
-               rc = ata_wait_after_reset(link, HISI_SAS_WAIT_PHYUP_TIMEOUT,
+               rc = ata_wait_after_reset(link, jiffies + HISI_SAS_WAIT_PHYUP_TIMEOUT,
                                          smp_ata_check_ready_type);
        } else {
                msleep(2000);
index 7d2a33514538c2cd8083733d8303f4dc5934de7d..34f96cc35342bcb4ad2e4208d69b19073e6a9bb2 100644 (file)
@@ -2244,7 +2244,15 @@ slot_err_v3_hw(struct hisi_hba *hisi_hba, struct sas_task *task,
        case SAS_PROTOCOL_SATA | SAS_PROTOCOL_STP:
                if ((dw0 & CMPLT_HDR_RSPNS_XFRD_MSK) &&
                    (sipc_rx_err_type & RX_FIS_STATUS_ERR_MSK)) {
-                       ts->stat = SAS_PROTO_RESPONSE;
+                       if (task->ata_task.use_ncq) {
+                               struct domain_device *device = task->dev;
+                               struct hisi_sas_device *sas_dev = device->lldd_dev;
+
+                               sas_dev->dev_status = HISI_SAS_DEV_NCQ_ERR;
+                               slot->abort = 1;
+                       } else {
+                               ts->stat = SAS_PROTO_RESPONSE;
+                       }
                } else if (dma_rx_err_type & RX_DATA_LEN_UNDERFLOW_MSK) {
                        ts->residual = trans_tx_fail_type;
                        ts->stat = SAS_DATA_UNDERRUN;
index 5c261005b74e47063172d3a0d10fb6e34cf74764..f6e6db8b8aba9133410834ee819f11dbaf1efc4b 100644 (file)
@@ -135,7 +135,7 @@ static int smp_execute_task(struct domain_device *dev, void *req, int req_size,
 
 static inline void *alloc_smp_req(int size)
 {
-       u8 *p = kzalloc(size, GFP_KERNEL);
+       u8 *p = kzalloc(ALIGN(size, ARCH_DMA_MINALIGN), GFP_KERNEL);
        if (p)
                p[0] = SMP_REQUEST;
        return p;
index ca2e932dd9b7016a92649113df439627e4a1e32b..f684eb5e04898aff3a2d164bad4649dab716861f 100644 (file)
@@ -1775,9 +1775,9 @@ static ssize_t raid_state_show(struct device *dev,
 
                name = myrb_devstate_name(ldev_info->state);
                if (name)
-                       ret = snprintf(buf, 32, "%s\n", name);
+                       ret = snprintf(buf, 64, "%s\n", name);
                else
-                       ret = snprintf(buf, 32, "Invalid (%02X)\n",
+                       ret = snprintf(buf, 64, "Invalid (%02X)\n",
                                       ldev_info->state);
        } else {
                struct myrb_pdev_state *pdev_info = sdev->hostdata;
@@ -1796,9 +1796,9 @@ static ssize_t raid_state_show(struct device *dev,
                else
                        name = myrb_devstate_name(pdev_info->state);
                if (name)
-                       ret = snprintf(buf, 32, "%s\n", name);
+                       ret = snprintf(buf, 64, "%s\n", name);
                else
-                       ret = snprintf(buf, 32, "Invalid (%02X)\n",
+                       ret = snprintf(buf, 64, "Invalid (%02X)\n",
                                       pdev_info->state);
        }
        return ret;
@@ -1886,11 +1886,11 @@ static ssize_t raid_level_show(struct device *dev,
 
                name = myrb_raidlevel_name(ldev_info->raid_level);
                if (!name)
-                       return snprintf(buf, 32, "Invalid (%02X)\n",
+                       return snprintf(buf, 64, "Invalid (%02X)\n",
                                        ldev_info->state);
-               return snprintf(buf, 32, "%s\n", name);
+               return snprintf(buf, 64, "%s\n", name);
        }
-       return snprintf(buf, 32, "Physical Drive\n");
+       return snprintf(buf, 64, "Physical Drive\n");
 }
 static DEVICE_ATTR_RO(raid_level);
 
@@ -1903,15 +1903,15 @@ static ssize_t rebuild_show(struct device *dev,
        unsigned char status;
 
        if (sdev->channel < myrb_logical_channel(sdev->host))
-               return snprintf(buf, 32, "physical device - not rebuilding\n");
+               return snprintf(buf, 64, "physical device - not rebuilding\n");
 
        status = myrb_get_rbld_progress(cb, &rbld_buf);
 
        if (rbld_buf.ldev_num != sdev->id ||
            status != MYRB_STATUS_SUCCESS)
-               return snprintf(buf, 32, "not rebuilding\n");
+               return snprintf(buf, 64, "not rebuilding\n");
 
-       return snprintf(buf, 32, "rebuilding block %u of %u\n",
+       return snprintf(buf, 64, "rebuilding block %u of %u\n",
                        rbld_buf.ldev_size - rbld_buf.blocks_left,
                        rbld_buf.ldev_size);
 }
index a1eec65a9713f5bb79a25360f9554bc379f052b3..e824be9d9bbb94f1c1f88bf3da591d7e585dcf8d 100644 (file)
@@ -947,9 +947,9 @@ static ssize_t raid_state_show(struct device *dev,
 
                name = myrs_devstate_name(ldev_info->dev_state);
                if (name)
-                       ret = snprintf(buf, 32, "%s\n", name);
+                       ret = snprintf(buf, 64, "%s\n", name);
                else
-                       ret = snprintf(buf, 32, "Invalid (%02X)\n",
+                       ret = snprintf(buf, 64, "Invalid (%02X)\n",
                                       ldev_info->dev_state);
        } else {
                struct myrs_pdev_info *pdev_info;
@@ -958,9 +958,9 @@ static ssize_t raid_state_show(struct device *dev,
                pdev_info = sdev->hostdata;
                name = myrs_devstate_name(pdev_info->dev_state);
                if (name)
-                       ret = snprintf(buf, 32, "%s\n", name);
+                       ret = snprintf(buf, 64, "%s\n", name);
                else
-                       ret = snprintf(buf, 32, "Invalid (%02X)\n",
+                       ret = snprintf(buf, 64, "Invalid (%02X)\n",
                                       pdev_info->dev_state);
        }
        return ret;
@@ -1066,13 +1066,13 @@ static ssize_t raid_level_show(struct device *dev,
                ldev_info = sdev->hostdata;
                name = myrs_raid_level_name(ldev_info->raid_level);
                if (!name)
-                       return snprintf(buf, 32, "Invalid (%02X)\n",
+                       return snprintf(buf, 64, "Invalid (%02X)\n",
                                        ldev_info->dev_state);
 
        } else
                name = myrs_raid_level_name(MYRS_RAID_PHYSICAL);
 
-       return snprintf(buf, 32, "%s\n", name);
+       return snprintf(buf, 64, "%s\n", name);
 }
 static DEVICE_ATTR_RO(raid_level);
 
@@ -1086,7 +1086,7 @@ static ssize_t rebuild_show(struct device *dev,
        unsigned char status;
 
        if (sdev->channel < cs->ctlr_info->physchan_present)
-               return snprintf(buf, 32, "physical device - not rebuilding\n");
+               return snprintf(buf, 64, "physical device - not rebuilding\n");
 
        ldev_info = sdev->hostdata;
        ldev_num = ldev_info->ldev_num;
@@ -1098,11 +1098,11 @@ static ssize_t rebuild_show(struct device *dev,
                return -EIO;
        }
        if (ldev_info->rbld_active) {
-               return snprintf(buf, 32, "rebuilding block %zu of %zu\n",
+               return snprintf(buf, 64, "rebuilding block %zu of %zu\n",
                                (size_t)ldev_info->rbld_lba,
                                (size_t)ldev_info->cfg_devsize);
        } else
-               return snprintf(buf, 32, "not rebuilding\n");
+               return snprintf(buf, 64, "not rebuilding\n");
 }
 
 static ssize_t rebuild_store(struct device *dev,
@@ -1190,7 +1190,7 @@ static ssize_t consistency_check_show(struct device *dev,
        unsigned short ldev_num;
 
        if (sdev->channel < cs->ctlr_info->physchan_present)
-               return snprintf(buf, 32, "physical device - not checking\n");
+               return snprintf(buf, 64, "physical device - not checking\n");
 
        ldev_info = sdev->hostdata;
        if (!ldev_info)
@@ -1198,11 +1198,11 @@ static ssize_t consistency_check_show(struct device *dev,
        ldev_num = ldev_info->ldev_num;
        myrs_get_ldev_info(cs, ldev_num, ldev_info);
        if (ldev_info->cc_active)
-               return snprintf(buf, 32, "checking block %zu of %zu\n",
+               return snprintf(buf, 64, "checking block %zu of %zu\n",
                                (size_t)ldev_info->cc_lba,
                                (size_t)ldev_info->cfg_devsize);
        else
-               return snprintf(buf, 32, "not checking\n");
+               return snprintf(buf, 64, "not checking\n");
 }
 
 static ssize_t consistency_check_store(struct device *dev,
index 26e6b3e3af4317ca088941bc5bab37c15aebfd32..dcde55c8ee5deadd421b108087605c5c822c3b4c 100644 (file)
@@ -1100,7 +1100,7 @@ qla_edif_app_getstats(scsi_qla_host_t *vha, struct bsg_job *bsg_job)
 
                list_for_each_entry_safe(fcport, tf, &vha->vp_fcports, list) {
                        if (fcport->edif.enable) {
-                               if (pcnt > app_req.num_ports)
+                               if (pcnt >= app_req.num_ports)
                                        break;
 
                                app_reply->elem[pcnt].rekey_count =
index 3cf89867029044ede7b2f6a844dca974b8d50e32..58fdf679341dc64ee1768f1d09dc7ce4d549b4bd 100644 (file)
@@ -3920,7 +3920,7 @@ static int sd_probe(struct device *dev)
 
        error = device_add_disk(dev, gd, NULL);
        if (error) {
-               put_device(&sdkp->disk_dev);
+               device_unregister(&sdkp->disk_dev);
                put_disk(gd);
                goto out;
        }
index 386981c6976a53d668632457a47fcf1db609f5fd..baf870a03ecf6c6516f90e599188c659dc986bae 100644 (file)
@@ -285,6 +285,7 @@ sg_open(struct inode *inode, struct file *filp)
        int dev = iminor(inode);
        int flags = filp->f_flags;
        struct request_queue *q;
+       struct scsi_device *device;
        Sg_device *sdp;
        Sg_fd *sfp;
        int retval;
@@ -301,11 +302,12 @@ sg_open(struct inode *inode, struct file *filp)
 
        /* This driver's module count bumped by fops_get in <linux/fs.h> */
        /* Prevent the device driver from vanishing while we sleep */
-       retval = scsi_device_get(sdp->device);
+       device = sdp->device;
+       retval = scsi_device_get(device);
        if (retval)
                goto sg_put;
 
-       retval = scsi_autopm_get_device(sdp->device);
+       retval = scsi_autopm_get_device(device);
        if (retval)
                goto sdp_put;
 
@@ -313,7 +315,7 @@ sg_open(struct inode *inode, struct file *filp)
         * check if O_NONBLOCK. Permits SCSI commands to be issued
         * during error recovery. Tread carefully. */
        if (!((flags & O_NONBLOCK) ||
-             scsi_block_when_processing_errors(sdp->device))) {
+             scsi_block_when_processing_errors(device))) {
                retval = -ENXIO;
                /* we are in error recovery for this device */
                goto error_out;
@@ -344,7 +346,7 @@ sg_open(struct inode *inode, struct file *filp)
 
        if (sdp->open_cnt < 1) {  /* no existing opens */
                sdp->sgdebug = 0;
-               q = sdp->device->request_queue;
+               q = device->request_queue;
                sdp->sg_tablesize = queue_max_segments(q);
        }
        sfp = sg_add_sfp(sdp);
@@ -370,10 +372,11 @@ out_undo:
 error_mutex_locked:
        mutex_unlock(&sdp->open_rel_lock);
 error_out:
-       scsi_autopm_put_device(sdp->device);
+       scsi_autopm_put_device(device);
 sdp_put:
-       scsi_device_put(sdp->device);
-       goto sg_put;
+       kref_put(&sdp->d_ref, sg_device_destroy);
+       scsi_device_put(device);
+       return retval;
 }
 
 /* Release resources associated with a successful sg_open()
@@ -2233,7 +2236,6 @@ sg_remove_sfp_usercontext(struct work_struct *work)
                        "sg_remove_sfp: sfp=0x%p\n", sfp));
        kfree(sfp);
 
-       WARN_ON_ONCE(kref_read(&sdp->d_ref) != 1);
        kref_put(&sdp->d_ref, sg_device_destroy);
        scsi_device_put(device);
        module_put(THIS_MODULE);
index 079035db7dd8592aa62a21d5e88480028c5941bb..92a662d1b55cf2ed044fcbbfe96fd03ef0035736 100644 (file)
@@ -852,39 +852,39 @@ static int fsl_lpspi_probe(struct platform_device *pdev)
        fsl_lpspi->base = devm_platform_get_and_ioremap_resource(pdev, 0, &res);
        if (IS_ERR(fsl_lpspi->base)) {
                ret = PTR_ERR(fsl_lpspi->base);
-               goto out_controller_put;
+               return ret;
        }
        fsl_lpspi->base_phys = res->start;
 
        irq = platform_get_irq(pdev, 0);
        if (irq < 0) {
                ret = irq;
-               goto out_controller_put;
+               return ret;
        }
 
        ret = devm_request_irq(&pdev->dev, irq, fsl_lpspi_isr, 0,
                               dev_name(&pdev->dev), fsl_lpspi);
        if (ret) {
                dev_err(&pdev->dev, "can't get irq%d: %d\n", irq, ret);
-               goto out_controller_put;
+               return ret;
        }
 
        fsl_lpspi->clk_per = devm_clk_get(&pdev->dev, "per");
        if (IS_ERR(fsl_lpspi->clk_per)) {
                ret = PTR_ERR(fsl_lpspi->clk_per);
-               goto out_controller_put;
+               return ret;
        }
 
        fsl_lpspi->clk_ipg = devm_clk_get(&pdev->dev, "ipg");
        if (IS_ERR(fsl_lpspi->clk_ipg)) {
                ret = PTR_ERR(fsl_lpspi->clk_ipg);
-               goto out_controller_put;
+               return ret;
        }
 
        /* enable the clock */
        ret = fsl_lpspi_init_rpm(fsl_lpspi);
        if (ret)
-               goto out_controller_put;
+               return ret;
 
        ret = pm_runtime_get_sync(fsl_lpspi->dev);
        if (ret < 0) {
@@ -945,8 +945,6 @@ out_pm_get:
        pm_runtime_dont_use_autosuspend(fsl_lpspi->dev);
        pm_runtime_put_sync(fsl_lpspi->dev);
        pm_runtime_disable(fsl_lpspi->dev);
-out_controller_put:
-       spi_controller_put(controller);
 
        return ret;
 }
index 969965d7bc98b538c6a9a0aa710e2083c7a4925a..cc18d320370f97523fae77bb5b34fc199b3e62e5 100644 (file)
@@ -725,6 +725,8 @@ static int pci1xxxx_spi_probe(struct pci_dev *pdev, const struct pci_device_id *
                spi_bus->spi_int[iter] = devm_kzalloc(&pdev->dev,
                                                      sizeof(struct pci1xxxx_spi_internal),
                                                      GFP_KERNEL);
+               if (!spi_bus->spi_int[iter])
+                       return -ENOMEM;
                spi_sub_ptr = spi_bus->spi_int[iter];
                spi_sub_ptr->spi_host = devm_spi_alloc_host(dev, sizeof(struct spi_controller));
                if (!spi_sub_ptr->spi_host)
index 9fcbe040cb2f2e0b8f8fb916d1c867c1c1949332..f726d86704287e56b5cffba6ed40d8f7e1ca4956 100644 (file)
@@ -430,7 +430,7 @@ static bool s3c64xx_spi_can_dma(struct spi_controller *host,
        struct s3c64xx_spi_driver_data *sdd = spi_controller_get_devdata(host);
 
        if (sdd->rx_dma.ch && sdd->tx_dma.ch)
-               return xfer->len > sdd->fifo_depth;
+               return xfer->len >= sdd->fifo_depth;
 
        return false;
 }
@@ -826,10 +826,9 @@ static int s3c64xx_spi_transfer_one(struct spi_controller *host,
                        return status;
        }
 
-       if (!is_polling(sdd) && (xfer->len > fifo_len) &&
+       if (!is_polling(sdd) && xfer->len >= fifo_len &&
            sdd->rx_dma.ch && sdd->tx_dma.ch) {
                use_dma = 1;
-
        } else if (xfer->len >= fifo_len) {
                tx_buf = xfer->tx_buf;
                rx_buf = xfer->rx_buf;
index c1fbcdd1618264f0cd09f5e4078ac600ad6dc22a..c40217f44b1bc53d149e8d5ea12c0e5297373800 100644 (file)
@@ -3672,6 +3672,8 @@ static int __init target_core_init_configfs(void)
 {
        struct configfs_subsystem *subsys = &target_core_fabrics;
        struct t10_alua_lu_gp *lu_gp;
+       struct cred *kern_cred;
+       const struct cred *old_cred;
        int ret;
 
        pr_debug("TARGET_CORE[0]: Loading Generic Kernel Storage"
@@ -3748,11 +3750,21 @@ static int __init target_core_init_configfs(void)
        if (ret < 0)
                goto out;
 
+       /* We use the kernel credentials to access the target directory */
+       kern_cred = prepare_kernel_cred(&init_task);
+       if (!kern_cred) {
+               ret = -ENOMEM;
+               goto out;
+       }
+       old_cred = override_creds(kern_cred);
        target_init_dbroot();
+       revert_creds(old_cred);
+       put_cred(kern_cred);
 
        return 0;
 
 out:
+       target_xcopy_release_pt();
        configfs_unregister_subsystem(subsys);
        core_dev_release_virtual_lun0();
        rd_module_exit();
index 1b17dc4c219cc94aae8bf030526298aca3e29eaa..e25e48d76aa79c843e6873fa2ee8bc1a830bc7f5 100644 (file)
@@ -606,7 +606,7 @@ static int allocate_actors_buffer(struct power_allocator_params *params,
 
        /* There might be no cooling devices yet. */
        if (!num_actors) {
-               ret = -EINVAL;
+               ret = 0;
                goto clean_state;
        }
 
@@ -679,11 +679,6 @@ static int power_allocator_bind(struct thermal_zone_device *tz)
                return -ENOMEM;
 
        get_governor_trips(tz, params);
-       if (!params->trip_max) {
-               dev_warn(&tz->device, "power_allocator: missing trip_max\n");
-               kfree(params);
-               return -EINVAL;
-       }
 
        ret = check_power_actors(tz, params);
        if (ret < 0) {
@@ -714,9 +709,10 @@ static int power_allocator_bind(struct thermal_zone_device *tz)
        else
                params->sustainable_power = tz->tzp->sustainable_power;
 
-       estimate_pid_constants(tz, tz->tzp->sustainable_power,
-                              params->trip_switch_on,
-                              params->trip_max->temperature);
+       if (params->trip_max)
+               estimate_pid_constants(tz, tz->tzp->sustainable_power,
+                                      params->trip_switch_on,
+                                      params->trip_max->temperature);
 
        reset_pid_controller(params);
 
index e30fd125988d7a8ca521d6fb30e97c671f269732..a0f8e930167d70aab48b315076fe76f9924e34b4 100644 (file)
@@ -3217,7 +3217,9 @@ retry:
 
                /* MCQ mode */
                if (is_mcq_enabled(hba)) {
-                       err = ufshcd_clear_cmd(hba, lrbp->task_tag);
+                       /* successfully cleared the command, retry if needed */
+                       if (ufshcd_clear_cmd(hba, lrbp->task_tag) == 0)
+                               err = -EAGAIN;
                        hba->dev_cmd.complete = NULL;
                        return err;
                }
@@ -9791,7 +9793,10 @@ static int __ufshcd_wl_suspend(struct ufs_hba *hba, enum ufs_pm_op pm_op)
 
        /* UFS device & link must be active before we enter in this function */
        if (!ufshcd_is_ufs_dev_active(hba) || !ufshcd_is_link_active(hba)) {
-               ret = -EINVAL;
+               /*  Wait err handler finish or trigger err recovery */
+               if (!ufshcd_eh_in_progress(hba))
+                       ufshcd_force_error_recovery(hba);
+               ret = -EBUSY;
                goto enable_scaling;
        }
 
index 9cdaa2faa5363333627e0cba54a4efe75b45b144..0f4f531c97800c648437fb2eb7409ccc2b198536 100644 (file)
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1202,8 +1202,8 @@ static void aio_complete(struct aio_kiocb *iocb)
                spin_lock_irqsave(&ctx->wait.lock, flags);
                list_for_each_entry_safe(curr, next, &ctx->wait.head, w.entry)
                        if (avail >= curr->min_nr) {
-                               list_del_init_careful(&curr->w.entry);
                                wake_up_process(curr->w.private);
+                               list_del_init_careful(&curr->w.entry);
                        }
                spin_unlock_irqrestore(&ctx->wait.lock, flags);
        }
index 3640f417cce118b06e43ae4c8b38bb275b0097fc..5c180fdc3efbdf09791c7941f3f1522cd6d6f9dc 100644 (file)
@@ -281,7 +281,6 @@ struct posix_acl *bch2_get_acl(struct mnt_idmap *idmap,
        struct xattr_search_key search = X_SEARCH(acl_to_xattr_type(type), "", 0);
        struct btree_trans *trans = bch2_trans_get(c);
        struct btree_iter iter = { NULL };
-       struct bkey_s_c_xattr xattr;
        struct posix_acl *acl = NULL;
        struct bkey_s_c k;
        int ret;
@@ -290,28 +289,27 @@ retry:
 
        ret = bch2_hash_lookup(trans, &iter, bch2_xattr_hash_desc,
                        &hash, inode_inum(inode), &search, 0);
-       if (ret) {
-               if (!bch2_err_matches(ret, ENOENT))
-                       acl = ERR_PTR(ret);
-               goto out;
-       }
+       if (ret)
+               goto err;
 
        k = bch2_btree_iter_peek_slot(&iter);
        ret = bkey_err(k);
-       if (ret) {
-               acl = ERR_PTR(ret);
-               goto out;
-       }
+       if (ret)
+               goto err;
 
-       xattr = bkey_s_c_to_xattr(k);
+       struct bkey_s_c_xattr xattr = bkey_s_c_to_xattr(k);
        acl = bch2_acl_from_disk(trans, xattr_val(xattr.v),
-                       le16_to_cpu(xattr.v->x_val_len));
+                                le16_to_cpu(xattr.v->x_val_len));
+       ret = PTR_ERR_OR_ZERO(acl);
+err:
+       if (bch2_err_matches(ret, BCH_ERR_transaction_restart))
+               goto retry;
 
-       if (!IS_ERR(acl))
+       if (ret)
+               acl = !bch2_err_matches(ret, ENOENT) ? ERR_PTR(ret) : NULL;
+
+       if (!IS_ERR_OR_NULL(acl))
                set_cached_acl(&inode->v, type, acl);
-out:
-       if (bch2_err_matches(PTR_ERR_OR_ZERO(acl), BCH_ERR_transaction_restart))
-               goto retry;
 
        bch2_trans_iter_exit(trans, &iter);
        bch2_trans_put(trans);
index 63102992d9556d1b33b445a3116df61964a6ca01..364ae42022af1750f9887d3f16c566a676b30776 100644 (file)
@@ -1535,6 +1535,20 @@ enum btree_id {
        BTREE_ID_NR
 };
 
+static inline bool btree_id_is_alloc(enum btree_id id)
+{
+       switch (id) {
+       case BTREE_ID_alloc:
+       case BTREE_ID_backpointers:
+       case BTREE_ID_need_discard:
+       case BTREE_ID_freespace:
+       case BTREE_ID_bucket_gens:
+               return true;
+       default:
+               return false;
+       }
+}
+
 #define BTREE_MAX_DEPTH                4U
 
 /* Btree nodes */
index 6280da1244b55032beaf60c4e2b29df0ff2c3152..d2555da55c6da3750af9fab2538e3653118a48d9 100644 (file)
@@ -368,11 +368,16 @@ again:
                                buf.buf)) {
                        bch2_btree_node_evict(trans, cur_k.k);
                        cur = NULL;
-                       ret =   bch2_run_explicit_recovery_pass(c, BCH_RECOVERY_PASS_scan_for_btree_nodes) ?:
-                               bch2_journal_key_delete(c, b->c.btree_id,
-                                                       b->c.level, cur_k.k->k.p);
+                       ret = bch2_journal_key_delete(c, b->c.btree_id,
+                                                     b->c.level, cur_k.k->k.p);
                        if (ret)
                                break;
+
+                       if (!btree_id_is_alloc(b->c.btree_id)) {
+                               ret = bch2_run_explicit_recovery_pass(c, BCH_RECOVERY_PASS_scan_for_btree_nodes);
+                               if (ret)
+                                       break;
+                       }
                        continue;
                }
 
@@ -544,12 +549,12 @@ reconstruct_root:
                                bch2_btree_root_alloc_fake(c, i, 0);
                        } else {
                                bch2_btree_root_alloc_fake(c, i, 1);
+                               bch2_shoot_down_journal_keys(c, i, 1, BTREE_MAX_DEPTH, POS_MIN, SPOS_MAX);
                                ret = bch2_get_scanned_nodes(c, i, 0, POS_MIN, SPOS_MAX);
                                if (ret)
                                        break;
                        }
 
-                       bch2_shoot_down_journal_keys(c, i, 1, BTREE_MAX_DEPTH, POS_MIN, SPOS_MAX);
                        reconstructed_root = true;
                }
 
index 24772538e4cc74ada59851bd7847dd5ece5ea122..1d58d447b386cdf74ecce2609f8aa23851ae4057 100644 (file)
@@ -642,7 +642,7 @@ int __bch2_btree_trans_too_many_iters(struct btree_trans *);
 
 static inline int btree_trans_too_many_iters(struct btree_trans *trans)
 {
-       if (bitmap_weight(trans->paths_allocated, trans->nr_paths) > BTREE_ITER_INITIAL - 8)
+       if (bitmap_weight(trans->paths_allocated, trans->nr_paths) > BTREE_ITER_NORMAL_LIMIT - 8)
                return __bch2_btree_trans_too_many_iters(trans);
 
        return 0;
index 5cbcbfe85235b8de3777ae82b120d4627f99c8d7..1e8cf49a69353198774a0e5b798c2f1f135041fa 100644 (file)
@@ -130,12 +130,30 @@ struct bkey_i *bch2_journal_keys_peek_slot(struct bch_fs *c, enum btree_id btree
        return bch2_journal_keys_peek_upto(c, btree_id, level, pos, pos, &idx);
 }
 
+static void journal_iter_verify(struct journal_iter *iter)
+{
+       struct journal_keys *keys = iter->keys;
+       size_t gap_size = keys->size - keys->nr;
+
+       BUG_ON(iter->idx >= keys->gap &&
+              iter->idx <  keys->gap + gap_size);
+
+       if (iter->idx < keys->size) {
+               struct journal_key *k = keys->data + iter->idx;
+
+               int cmp = cmp_int(k->btree_id,  iter->btree_id) ?:
+                         cmp_int(k->level,     iter->level);
+               BUG_ON(cmp < 0);
+       }
+}
+
 static void journal_iters_fix(struct bch_fs *c)
 {
        struct journal_keys *keys = &c->journal_keys;
        /* The key we just inserted is immediately before the gap: */
        size_t gap_end = keys->gap + (keys->size - keys->nr);
-       struct btree_and_journal_iter *iter;
+       struct journal_key *new_key = &keys->data[keys->gap - 1];
+       struct journal_iter *iter;
 
        /*
         * If an iterator points one after the key we just inserted, decrement
@@ -143,9 +161,14 @@ static void journal_iters_fix(struct bch_fs *c)
         * decrement was unnecessary, bch2_btree_and_journal_iter_peek() will
         * handle that:
         */
-       list_for_each_entry(iter, &c->journal_iters, journal.list)
-               if (iter->journal.idx == gap_end)
-                       iter->journal.idx = keys->gap - 1;
+       list_for_each_entry(iter, &c->journal_iters, list) {
+               journal_iter_verify(iter);
+               if (iter->idx           == gap_end &&
+                   new_key->btree_id   == iter->btree_id &&
+                   new_key->level      == iter->level)
+                       iter->idx = keys->gap - 1;
+               journal_iter_verify(iter);
+       }
 }
 
 static void journal_iters_move_gap(struct bch_fs *c, size_t old_gap, size_t new_gap)
@@ -192,7 +215,12 @@ int bch2_journal_key_insert_take(struct bch_fs *c, enum btree_id id,
        if (idx > keys->gap)
                idx -= keys->size - keys->nr;
 
+       size_t old_gap = keys->gap;
+
        if (keys->nr == keys->size) {
+               journal_iters_move_gap(c, old_gap, keys->size);
+               old_gap = keys->size;
+
                struct journal_keys new_keys = {
                        .nr                     = keys->nr,
                        .size                   = max_t(size_t, keys->size, 8) * 2,
@@ -216,7 +244,7 @@ int bch2_journal_key_insert_take(struct bch_fs *c, enum btree_id id,
                keys->gap       = keys->nr;
        }
 
-       journal_iters_move_gap(c, keys->gap, idx);
+       journal_iters_move_gap(c, old_gap, idx);
 
        move_gap(keys, idx);
 
@@ -301,16 +329,21 @@ static void bch2_journal_iter_advance(struct journal_iter *iter)
 
 static struct bkey_s_c bch2_journal_iter_peek(struct journal_iter *iter)
 {
-       struct journal_key *k = iter->keys->data + iter->idx;
+       journal_iter_verify(iter);
+
+       while (iter->idx < iter->keys->size) {
+               struct journal_key *k = iter->keys->data + iter->idx;
+
+               int cmp = cmp_int(k->btree_id,  iter->btree_id) ?:
+                         cmp_int(k->level,     iter->level);
+               if (cmp > 0)
+                       break;
+               BUG_ON(cmp);
 
-       while (k < iter->keys->data + iter->keys->size &&
-              k->btree_id      == iter->btree_id &&
-              k->level         == iter->level) {
                if (!k->overwritten)
                        return bkey_i_to_s_c(k->k);
 
                bch2_journal_iter_advance(iter);
-               k = iter->keys->data + iter->idx;
        }
 
        return bkey_s_c_null;
@@ -330,6 +363,8 @@ static void bch2_journal_iter_init(struct bch_fs *c,
        iter->level     = level;
        iter->keys      = &c->journal_keys;
        iter->idx       = bch2_journal_key_search(&c->journal_keys, id, level, pos);
+
+       journal_iter_verify(iter);
 }
 
 static struct bkey_s_c bch2_journal_iter_peek_btree(struct btree_and_journal_iter *iter)
@@ -434,10 +469,15 @@ void __bch2_btree_and_journal_iter_init_node_iter(struct btree_trans *trans,
        iter->trans = trans;
        iter->b = b;
        iter->node_iter = node_iter;
-       bch2_journal_iter_init(trans->c, &iter->journal, b->c.btree_id, b->c.level, pos);
-       INIT_LIST_HEAD(&iter->journal.list);
        iter->pos = b->data->min_key;
        iter->at_end = false;
+       INIT_LIST_HEAD(&iter->journal.list);
+
+       if (trans->journal_replay_not_finished) {
+               bch2_journal_iter_init(trans->c, &iter->journal, b->c.btree_id, b->c.level, pos);
+               if (!test_bit(BCH_FS_may_go_rw, &trans->c->flags))
+                       list_add(&iter->journal.list, &trans->c->journal_iters);
+       }
 }
 
 /*
@@ -452,9 +492,6 @@ void bch2_btree_and_journal_iter_init_node_iter(struct btree_trans *trans,
 
        bch2_btree_node_iter_init_from_start(&node_iter, b);
        __bch2_btree_and_journal_iter_init_node_iter(trans, iter, b, node_iter, b->data->min_key);
-       if (trans->journal_replay_not_finished &&
-           !test_bit(BCH_FS_may_go_rw, &trans->c->flags))
-               list_add(&iter->journal.list, &trans->c->journal_iters);
 }
 
 /* sort and dedup all keys in the journal: */
index 581edcb0911bfa39e9ec6242686bd213c47f352c..88a3582a32757e34a28eb37143f9ff78a88a4085 100644 (file)
@@ -169,6 +169,7 @@ static void bkey_cached_move_to_freelist(struct btree_key_cache *bc,
        } else {
                mutex_lock(&bc->lock);
                list_move_tail(&ck->list, &bc->freed_pcpu);
+               bc->nr_freed_pcpu++;
                mutex_unlock(&bc->lock);
        }
 }
@@ -245,6 +246,7 @@ bkey_cached_alloc(struct btree_trans *trans, struct btree_path *path,
                if (!list_empty(&bc->freed_pcpu)) {
                        ck = list_last_entry(&bc->freed_pcpu, struct bkey_cached, list);
                        list_del_init(&ck->list);
+                       bc->nr_freed_pcpu--;
                }
                mutex_unlock(&bc->lock);
        }
@@ -659,7 +661,7 @@ static int btree_key_cache_flush_pos(struct btree_trans *trans,
                commit_flags |= BCH_WATERMARK_reclaim;
 
        if (ck->journal.seq != journal_last_seq(j) ||
-           j->watermark == BCH_WATERMARK_stripe)
+           !test_bit(JOURNAL_SPACE_LOW, &c->journal.flags))
                commit_flags |= BCH_TRANS_COMMIT_no_journal_res;
 
        ret   = bch2_btree_iter_traverse(&b_iter) ?:
index b9b151e693ed60ecc3dc9147cc34902643cfc7aa..f2caf491957efc2345c082323516e58fe2a35302 100644 (file)
@@ -440,33 +440,7 @@ void bch2_btree_node_lock_write_nofail(struct btree_trans *trans,
                                       struct btree_path *path,
                                       struct btree_bkey_cached_common *b)
 {
-       struct btree_path *linked;
-       unsigned i, iter;
-       int ret;
-
-       /*
-        * XXX BIG FAT NOTICE
-        *
-        * Drop all read locks before taking a write lock:
-        *
-        * This is a hack, because bch2_btree_node_lock_write_nofail() is a
-        * hack - but by dropping read locks first, this should never fail, and
-        * we only use this in code paths where whatever read locks we've
-        * already taken are no longer needed:
-        */
-
-       trans_for_each_path(trans, linked, iter) {
-               if (!linked->nodes_locked)
-                       continue;
-
-               for (i = 0; i < BTREE_MAX_DEPTH; i++)
-                       if (btree_node_read_locked(linked, i)) {
-                               btree_node_unlock(trans, linked, i);
-                               btree_path_set_dirty(linked, BTREE_ITER_NEED_RELOCK);
-                       }
-       }
-
-       ret = __btree_node_lock_write(trans, path, b, true);
+       int ret = __btree_node_lock_write(trans, path, b, true);
        BUG_ON(ret);
 }
 
index 3f33be7e5e5c26d9be6d0f1431ee04c3e545b825..556f76f5c84e1613c332e7443e6bb8b1602dd359 100644 (file)
@@ -133,6 +133,9 @@ static void try_read_btree_node(struct find_btree_nodes *f, struct bch_dev *ca,
        if (le64_to_cpu(bn->magic) != bset_magic(c))
                return;
 
+       if (btree_id_is_alloc(BTREE_NODE_ID(bn)))
+               return;
+
        rcu_read_lock();
        struct found_btree_node n = {
                .btree_id       = BTREE_NODE_ID(bn),
@@ -213,6 +216,9 @@ static int read_btree_nodes(struct find_btree_nodes *f)
        closure_init_stack(&cl);
 
        for_each_online_member(c, ca) {
+               if (!(ca->mi.data_allowed & BIT(BCH_DATA_btree)))
+                       continue;
+
                struct find_btree_nodes_worker *w = kmalloc(sizeof(*w), GFP_KERNEL);
                struct task_struct *t;
 
@@ -290,7 +296,7 @@ again:
                        found_btree_node_to_text(&buf, c, n);
                        bch_err(c, "%s", buf.buf);
                        printbuf_exit(&buf);
-                       return -1;
+                       return -BCH_ERR_fsck_repair_unimplemented;
                }
        }
 
@@ -436,6 +442,9 @@ bool bch2_btree_has_scanned_nodes(struct bch_fs *c, enum btree_id btree)
 int bch2_get_scanned_nodes(struct bch_fs *c, enum btree_id btree,
                           unsigned level, struct bpos node_min, struct bpos node_max)
 {
+       if (btree_id_is_alloc(btree))
+               return 0;
+
        struct find_btree_nodes *f = &c->found_btree_nodes;
 
        int ret = bch2_run_explicit_recovery_pass(c, BCH_RECOVERY_PASS_scan_for_btree_nodes);
index 9404d96c38f3b368726a6603b601b241b5106100..e0c982a4195c764ab8a415b5f7f80cbff88c1935 100644 (file)
@@ -364,7 +364,21 @@ struct btree_insert_entry {
        unsigned long           ip_allocated;
 };
 
+/* Number of btree paths we preallocate, usually enough */
 #define BTREE_ITER_INITIAL             64
+/*
+ * Lmiit for btree_trans_too_many_iters(); this is enough that almost all code
+ * paths should run inside this limit, and if they don't it usually indicates a
+ * bug (leaking/duplicated btree paths).
+ *
+ * exception: some fsck paths
+ *
+ * bugs with excessive path usage seem to have possibly been eliminated now, so
+ * we might consider eliminating this (and btree_trans_too_many_iter()) at some
+ * point.
+ */
+#define BTREE_ITER_NORMAL_LIMIT                256
+/* never exceed limit */
 #define BTREE_ITER_MAX                 (1U << 10)
 
 struct btree_trans_commit_hook;
index 32397b99752fd2ec3cfd553724c97c7f217ca56e..c4a5e83a56a436548263445e3b7a329644757cc7 100644 (file)
@@ -26,9 +26,9 @@
 
 #include <linux/random.h>
 
-const char * const bch2_btree_update_modes[] = {
+static const char * const bch2_btree_update_modes[] = {
 #define x(t) #t,
-       BCH_WATERMARKS()
+       BTREE_UPDATE_MODES()
 #undef x
        NULL
 };
@@ -704,9 +704,13 @@ static void btree_update_nodes_written(struct btree_update *as)
        bch2_fs_fatal_err_on(ret && !bch2_journal_error(&c->journal), c,
                             "%s", bch2_err_str(ret));
 err:
-       if (as->b) {
-
-               b = as->b;
+       /*
+        * We have to be careful because another thread might be getting ready
+        * to free as->b and calling btree_update_reparent() on us - we'll
+        * recheck under btree_update_lock below:
+        */
+       b = READ_ONCE(as->b);
+       if (b) {
                btree_path_idx_t path_idx = get_unlocked_mut_path(trans,
                                                as->btree_id, b->c.level, b->key.k.p);
                struct btree_path *path = trans->paths + path_idx;
@@ -850,15 +854,17 @@ static void btree_update_updated_node(struct btree_update *as, struct btree *b)
 {
        struct bch_fs *c = as->c;
 
-       mutex_lock(&c->btree_interior_update_lock);
-       list_add_tail(&as->unwritten_list, &c->btree_interior_updates_unwritten);
-
        BUG_ON(as->mode != BTREE_UPDATE_none);
+       BUG_ON(as->update_level_end < b->c.level);
        BUG_ON(!btree_node_dirty(b));
        BUG_ON(!b->c.level);
 
+       mutex_lock(&c->btree_interior_update_lock);
+       list_add_tail(&as->unwritten_list, &c->btree_interior_updates_unwritten);
+
        as->mode        = BTREE_UPDATE_node;
        as->b           = b;
+       as->update_level_end = b->c.level;
 
        set_btree_node_write_blocked(b);
        list_add(&as->write_blocked_list, &b->write_blocked);
@@ -1100,7 +1106,7 @@ static void bch2_btree_update_done(struct btree_update *as, struct btree_trans *
 
 static struct btree_update *
 bch2_btree_update_start(struct btree_trans *trans, struct btree_path *path,
-                       unsigned level, bool split, unsigned flags)
+                       unsigned level_start, bool split, unsigned flags)
 {
        struct bch_fs *c = trans->c;
        struct btree_update *as;
@@ -1108,7 +1114,7 @@ bch2_btree_update_start(struct btree_trans *trans, struct btree_path *path,
        int disk_res_flags = (flags & BCH_TRANS_COMMIT_no_enospc)
                ? BCH_DISK_RESERVATION_NOFAIL : 0;
        unsigned nr_nodes[2] = { 0, 0 };
-       unsigned update_level = level;
+       unsigned level_end = level_start;
        enum bch_watermark watermark = flags & BCH_WATERMARK_MASK;
        int ret = 0;
        u32 restart_count = trans->restart_count;
@@ -1123,34 +1129,30 @@ bch2_btree_update_start(struct btree_trans *trans, struct btree_path *path,
        flags &= ~BCH_WATERMARK_MASK;
        flags |= watermark;
 
-       if (watermark < c->journal.watermark) {
-               struct journal_res res = { 0 };
-               unsigned journal_flags = watermark|JOURNAL_RES_GET_CHECK;
+       if (watermark < BCH_WATERMARK_reclaim &&
+           test_bit(JOURNAL_SPACE_LOW, &c->journal.flags)) {
+               if (flags & BCH_TRANS_COMMIT_journal_reclaim)
+                       return ERR_PTR(-BCH_ERR_journal_reclaim_would_deadlock);
 
-               if ((flags & BCH_TRANS_COMMIT_journal_reclaim) &&
-                   watermark < BCH_WATERMARK_reclaim)
-                       journal_flags |= JOURNAL_RES_GET_NONBLOCK;
-
-               ret = drop_locks_do(trans,
-                       bch2_journal_res_get(&c->journal, &res, 1, journal_flags));
-               if (bch2_err_matches(ret, BCH_ERR_operation_blocked))
-                       ret = -BCH_ERR_journal_reclaim_would_deadlock;
+               bch2_trans_unlock(trans);
+               wait_event(c->journal.wait, !test_bit(JOURNAL_SPACE_LOW, &c->journal.flags));
+               ret = bch2_trans_relock(trans);
                if (ret)
                        return ERR_PTR(ret);
        }
 
        while (1) {
-               nr_nodes[!!update_level] += 1 + split;
-               update_level++;
+               nr_nodes[!!level_end] += 1 + split;
+               level_end++;
 
-               ret = bch2_btree_path_upgrade(trans, path, update_level + 1);
+               ret = bch2_btree_path_upgrade(trans, path, level_end + 1);
                if (ret)
                        return ERR_PTR(ret);
 
-               if (!btree_path_node(path, update_level)) {
+               if (!btree_path_node(path, level_end)) {
                        /* Allocating new root? */
                        nr_nodes[1] += split;
-                       update_level = BTREE_MAX_DEPTH;
+                       level_end = BTREE_MAX_DEPTH;
                        break;
                }
 
@@ -1158,11 +1160,11 @@ bch2_btree_update_start(struct btree_trans *trans, struct btree_path *path,
                 * Always check for space for two keys, even if we won't have to
                 * split at prior level - it might have been a merge instead:
                 */
-               if (bch2_btree_node_insert_fits(path->l[update_level].b,
+               if (bch2_btree_node_insert_fits(path->l[level_end].b,
                                                BKEY_BTREE_PTR_U64s_MAX * 2))
                        break;
 
-               split = path->l[update_level].b->nr.live_u64s > BTREE_SPLIT_THRESHOLD(c);
+               split = path->l[level_end].b->nr.live_u64s > BTREE_SPLIT_THRESHOLD(c);
        }
 
        if (!down_read_trylock(&c->gc_lock)) {
@@ -1176,14 +1178,15 @@ bch2_btree_update_start(struct btree_trans *trans, struct btree_path *path,
        as = mempool_alloc(&c->btree_interior_update_pool, GFP_NOFS);
        memset(as, 0, sizeof(*as));
        closure_init(&as->cl, NULL);
-       as->c           = c;
-       as->start_time  = start_time;
-       as->ip_started  = _RET_IP_;
-       as->mode        = BTREE_UPDATE_none;
-       as->watermark   = watermark;
-       as->took_gc_lock = true;
-       as->btree_id    = path->btree_id;
-       as->update_level = update_level;
+       as->c                   = c;
+       as->start_time          = start_time;
+       as->ip_started          = _RET_IP_;
+       as->mode                = BTREE_UPDATE_none;
+       as->watermark           = watermark;
+       as->took_gc_lock        = true;
+       as->btree_id            = path->btree_id;
+       as->update_level_start  = level_start;
+       as->update_level_end    = level_end;
        INIT_LIST_HEAD(&as->list);
        INIT_LIST_HEAD(&as->unwritten_list);
        INIT_LIST_HEAD(&as->write_blocked_list);
@@ -1373,12 +1376,12 @@ static void bch2_insert_fixup_btree_ptr(struct btree_update *as,
 }
 
 static void
-__bch2_btree_insert_keys_interior(struct btree_update *as,
-                                 struct btree_trans *trans,
-                                 struct btree_path *path,
-                                 struct btree *b,
-                                 struct btree_node_iter node_iter,
-                                 struct keylist *keys)
+bch2_btree_insert_keys_interior(struct btree_update *as,
+                               struct btree_trans *trans,
+                               struct btree_path *path,
+                               struct btree *b,
+                               struct btree_node_iter node_iter,
+                               struct keylist *keys)
 {
        struct bkey_i *insert = bch2_keylist_front(keys);
        struct bkey_packed *k;
@@ -1534,7 +1537,7 @@ static void btree_split_insert_keys(struct btree_update *as,
 
                bch2_btree_node_iter_init(&node_iter, b, &bch2_keylist_front(keys)->k.p);
 
-               __bch2_btree_insert_keys_interior(as, trans, path, b, node_iter, keys);
+               bch2_btree_insert_keys_interior(as, trans, path, b, node_iter, keys);
 
                BUG_ON(bch2_btree_node_check_topology(trans, b));
        }
@@ -1714,27 +1717,6 @@ err:
        goto out;
 }
 
-static void
-bch2_btree_insert_keys_interior(struct btree_update *as,
-                               struct btree_trans *trans,
-                               struct btree_path *path,
-                               struct btree *b,
-                               struct keylist *keys)
-{
-       struct btree_path *linked;
-       unsigned i;
-
-       __bch2_btree_insert_keys_interior(as, trans, path, b,
-                                         path->l[b->c.level].iter, keys);
-
-       btree_update_updated_node(as, b);
-
-       trans_for_each_path_with_node(trans, b, linked, i)
-               bch2_btree_node_iter_peek(&linked->l[b->c.level].iter, b);
-
-       bch2_trans_verify_paths(trans);
-}
-
 /**
  * bch2_btree_insert_node - insert bkeys into a given btree node
  *
@@ -1755,7 +1737,8 @@ static int bch2_btree_insert_node(struct btree_update *as, struct btree_trans *t
                                  struct keylist *keys)
 {
        struct bch_fs *c = as->c;
-       struct btree_path *path = trans->paths + path_idx;
+       struct btree_path *path = trans->paths + path_idx, *linked;
+       unsigned i;
        int old_u64s = le16_to_cpu(btree_bset_last(b)->u64s);
        int old_live_u64s = b->nr.live_u64s;
        int live_u64s_added, u64s_added;
@@ -1784,7 +1767,13 @@ static int bch2_btree_insert_node(struct btree_update *as, struct btree_trans *t
                return ret;
        }
 
-       bch2_btree_insert_keys_interior(as, trans, path, b, keys);
+       bch2_btree_insert_keys_interior(as, trans, path, b,
+                                       path->l[b->c.level].iter, keys);
+
+       trans_for_each_path_with_node(trans, b, linked, i)
+               bch2_btree_node_iter_peek(&linked->l[b->c.level].iter, b);
+
+       bch2_trans_verify_paths(trans);
 
        live_u64s_added = (int) b->nr.live_u64s - old_live_u64s;
        u64s_added = (int) le16_to_cpu(btree_bset_last(b)->u64s) - old_u64s;
@@ -1798,6 +1787,7 @@ static int bch2_btree_insert_node(struct btree_update *as, struct btree_trans *t
            bch2_maybe_compact_whiteouts(c, b))
                bch2_trans_node_reinit_iter(trans, b);
 
+       btree_update_updated_node(as, b);
        bch2_btree_node_unlock_write(trans, path, b);
 
        BUG_ON(bch2_btree_node_check_topology(trans, b));
@@ -1807,7 +1797,7 @@ split:
         * We could attempt to avoid the transaction restart, by calling
         * bch2_btree_path_upgrade() and allocating more nodes:
         */
-       if (b->c.level >= as->update_level) {
+       if (b->c.level >= as->update_level_end) {
                trace_and_count(c, trans_restart_split_race, trans, _THIS_IP_, b);
                return btree_trans_restart(trans, BCH_ERR_transaction_restart_split_race);
        }
@@ -2519,9 +2509,11 @@ void bch2_btree_root_alloc_fake(struct bch_fs *c, enum btree_id id, unsigned lev
 
 static void bch2_btree_update_to_text(struct printbuf *out, struct btree_update *as)
 {
-       prt_printf(out, "%ps: btree=%s watermark=%s mode=%s nodes_written=%u cl.remaining=%u journal_seq=%llu\n",
+       prt_printf(out, "%ps: btree=%s l=%u-%u watermark=%s mode=%s nodes_written=%u cl.remaining=%u journal_seq=%llu\n",
                   (void *) as->ip_started,
                   bch2_btree_id_str(as->btree_id),
+                  as->update_level_start,
+                  as->update_level_end,
                   bch2_watermarks[as->watermark],
                   bch2_btree_update_modes[as->mode],
                   as->nodes_written,
index 88dcf5a22a3bd628aaa22065f3cdd70ca3770d90..c1a479ebaad12120813f95a4af50b32cd542023d 100644 (file)
@@ -57,7 +57,8 @@ struct btree_update {
        unsigned                        took_gc_lock:1;
 
        enum btree_id                   btree_id;
-       unsigned                        update_level;
+       unsigned                        update_level_start;
+       unsigned                        update_level_end;
 
        struct disk_reservation         disk_res;
 
index cbfa6459bdbceec6a953f91a385fc5e4fe76691d..72781aad6ba70ccc774b688c6a9d50b2dc21f133 100644 (file)
@@ -134,42 +134,38 @@ static long bch2_ioctl_incremental(struct bch_ioctl_incremental __user *user_arg
 struct fsck_thread {
        struct thread_with_stdio thr;
        struct bch_fs           *c;
-       char                    **devs;
-       size_t                  nr_devs;
        struct bch_opts         opts;
 };
 
 static void bch2_fsck_thread_exit(struct thread_with_stdio *_thr)
 {
        struct fsck_thread *thr = container_of(_thr, struct fsck_thread, thr);
-       if (thr->devs)
-               for (size_t i = 0; i < thr->nr_devs; i++)
-                       kfree(thr->devs[i]);
-       kfree(thr->devs);
        kfree(thr);
 }
 
 static int bch2_fsck_offline_thread_fn(struct thread_with_stdio *stdio)
 {
        struct fsck_thread *thr = container_of(stdio, struct fsck_thread, thr);
-       struct bch_fs *c = bch2_fs_open(thr->devs, thr->nr_devs, thr->opts);
-
-       if (IS_ERR(c))
-               return PTR_ERR(c);
+       struct bch_fs *c = thr->c;
 
-       int ret = 0;
-       if (test_bit(BCH_FS_errors_fixed, &c->flags))
-               ret |= 1;
-       if (test_bit(BCH_FS_error, &c->flags))
-               ret |= 4;
+       int ret = PTR_ERR_OR_ZERO(c);
+       if (ret)
+               return ret;
 
-       bch2_fs_stop(c);
+       ret = bch2_fs_start(thr->c);
+       if (ret)
+               goto err;
 
-       if (ret & 1)
+       if (test_bit(BCH_FS_errors_fixed, &c->flags)) {
                bch2_stdio_redirect_printf(&stdio->stdio, false, "%s: errors fixed\n", c->name);
-       if (ret & 4)
+               ret |= 1;
+       }
+       if (test_bit(BCH_FS_error, &c->flags)) {
                bch2_stdio_redirect_printf(&stdio->stdio, false, "%s: still has errors\n", c->name);
-
+               ret |= 4;
+       }
+err:
+       bch2_fs_stop(c);
        return ret;
 }
 
@@ -182,7 +178,7 @@ static long bch2_ioctl_fsck_offline(struct bch_ioctl_fsck_offline __user *user_a
 {
        struct bch_ioctl_fsck_offline arg;
        struct fsck_thread *thr = NULL;
-       u64 *devs = NULL;
+       darray_str(devs) = {};
        long ret = 0;
 
        if (copy_from_user(&arg, user_arg, sizeof(arg)))
@@ -194,29 +190,32 @@ static long bch2_ioctl_fsck_offline(struct bch_ioctl_fsck_offline __user *user_a
        if (!capable(CAP_SYS_ADMIN))
                return -EPERM;
 
-       if (!(devs = kcalloc(arg.nr_devs, sizeof(*devs), GFP_KERNEL)) ||
-           !(thr = kzalloc(sizeof(*thr), GFP_KERNEL)) ||
-           !(thr->devs = kcalloc(arg.nr_devs, sizeof(*thr->devs), GFP_KERNEL))) {
-               ret = -ENOMEM;
-               goto err;
-       }
+       for (size_t i = 0; i < arg.nr_devs; i++) {
+               u64 dev_u64;
+               ret = copy_from_user_errcode(&dev_u64, &user_arg->devs[i], sizeof(u64));
+               if (ret)
+                       goto err;
 
-       thr->opts = bch2_opts_empty();
-       thr->nr_devs = arg.nr_devs;
+               char *dev_str = strndup_user((char __user *)(unsigned long) dev_u64, PATH_MAX);
+               ret = PTR_ERR_OR_ZERO(dev_str);
+               if (ret)
+                       goto err;
 
-       if (copy_from_user(devs, &user_arg->devs[0],
-                          array_size(sizeof(user_arg->devs[0]), arg.nr_devs))) {
-               ret = -EINVAL;
-               goto err;
+               ret = darray_push(&devs, dev_str);
+               if (ret) {
+                       kfree(dev_str);
+                       goto err;
+               }
        }
 
-       for (size_t i = 0; i < arg.nr_devs; i++) {
-               thr->devs[i] = strndup_user((char __user *)(unsigned long) devs[i], PATH_MAX);
-               ret = PTR_ERR_OR_ZERO(thr->devs[i]);
-               if (ret)
-                       goto err;
+       thr = kzalloc(sizeof(*thr), GFP_KERNEL);
+       if (!thr) {
+               ret = -ENOMEM;
+               goto err;
        }
 
+       thr->opts = bch2_opts_empty();
+
        if (arg.opts) {
                char *optstr = strndup_user((char __user *)(unsigned long) arg.opts, 1 << 16);
 
@@ -230,15 +229,26 @@ static long bch2_ioctl_fsck_offline(struct bch_ioctl_fsck_offline __user *user_a
 
        opt_set(thr->opts, stdio, (u64)(unsigned long)&thr->thr.stdio);
 
+       /* We need request_key() to be called before we punt to kthread: */
+       opt_set(thr->opts, nostart, true);
+
+       thr->c = bch2_fs_open(devs.data, arg.nr_devs, thr->opts);
+
+       if (!IS_ERR(thr->c) &&
+           thr->c->opts.errors == BCH_ON_ERROR_panic)
+               thr->c->opts.errors = BCH_ON_ERROR_ro;
+
        ret = bch2_run_thread_with_stdio(&thr->thr, &bch2_offline_fsck_ops);
-err:
-       if (ret < 0) {
-               if (thr)
-                       bch2_fsck_thread_exit(&thr->thr);
-               pr_err("ret %s", bch2_err_str(ret));
-       }
-       kfree(devs);
+out:
+       darray_for_each(devs, i)
+               kfree(*i);
+       darray_exit(&devs);
        return ret;
+err:
+       if (thr)
+               bch2_fsck_thread_exit(&thr->thr);
+       pr_err("ret %s", bch2_err_str(ret));
+       goto out;
 }
 
 static long bch2_global_ioctl(unsigned cmd, void __user *arg)
index 34731ee0217f62f6e43fb691e76083c46026b127..0022b51ce3c09cc9eafaab2f0639c944078d8c54 100644 (file)
@@ -598,6 +598,8 @@ int bch2_data_update_init(struct btree_trans *trans,
                i++;
        }
 
+       unsigned durability_required = max(0, (int) (io_opts.data_replicas - durability_have));
+
        /*
         * If current extent durability is less than io_opts.data_replicas,
         * we're not trying to rereplicate the extent up to data_replicas here -
@@ -607,7 +609,7 @@ int bch2_data_update_init(struct btree_trans *trans,
         * rereplicate, currently, so that users don't get an unexpected -ENOSPC
         */
        if (!(m->data_opts.write_flags & BCH_WRITE_CACHED) &&
-           durability_have >= io_opts.data_replicas) {
+           !durability_required) {
                m->data_opts.kill_ptrs |= m->data_opts.rewrite_ptrs;
                m->data_opts.rewrite_ptrs = 0;
                /* if iter == NULL, it's just a promote */
@@ -616,11 +618,18 @@ int bch2_data_update_init(struct btree_trans *trans,
                goto done;
        }
 
-       m->op.nr_replicas = min(durability_removing, io_opts.data_replicas - durability_have) +
+       m->op.nr_replicas = min(durability_removing, durability_required) +
                m->data_opts.extra_replicas;
-       m->op.nr_replicas_required = m->op.nr_replicas;
 
-       BUG_ON(!m->op.nr_replicas);
+       /*
+        * If device(s) were set to durability=0 after data was written to them
+        * we can end up with a duribilty=0 extent, and the normal algorithm
+        * that tries not to increase durability doesn't work:
+        */
+       if (!(durability_have + durability_removing))
+               m->op.nr_replicas = max((unsigned) m->op.nr_replicas, 1);
+
+       m->op.nr_replicas_required = m->op.nr_replicas;
 
        if (reserve_sectors) {
                ret = bch2_disk_reservation_add(c, &m->op.res, reserve_sectors,
index 208ce6f0fc4317d561582bae51785da2c016a1cd..cd99b739941447f4c54037c8dc87bffd5f5e0d25 100644 (file)
@@ -13,6 +13,7 @@
 #include "btree_iter.h"
 #include "btree_locking.h"
 #include "btree_update.h"
+#include "btree_update_interior.h"
 #include "buckets.h"
 #include "debug.h"
 #include "error.h"
@@ -668,7 +669,7 @@ static ssize_t bch2_journal_pins_read(struct file *file, char __user *buf,
        i->size = size;
        i->ret  = 0;
 
-       do {
+       while (1) {
                err = flush_buf(i);
                if (err)
                        return err;
@@ -676,9 +677,12 @@ static ssize_t bch2_journal_pins_read(struct file *file, char __user *buf,
                if (!i->size)
                        break;
 
+               if (done)
+                       break;
+
                done = bch2_journal_seq_pins_to_text(&i->buf, &c->journal, &i->iter);
                i->iter++;
-       } while (!done);
+       }
 
        if (i->buf.allocation_failure)
                return -ENOMEM;
@@ -693,13 +697,45 @@ static const struct file_operations journal_pins_ops = {
        .read           = bch2_journal_pins_read,
 };
 
+static ssize_t bch2_btree_updates_read(struct file *file, char __user *buf,
+                                      size_t size, loff_t *ppos)
+{
+       struct dump_iter *i = file->private_data;
+       struct bch_fs *c = i->c;
+       int err;
+
+       i->ubuf = buf;
+       i->size = size;
+       i->ret  = 0;
+
+       if (!i->iter) {
+               bch2_btree_updates_to_text(&i->buf, c);
+               i->iter++;
+       }
+
+       err = flush_buf(i);
+       if (err)
+               return err;
+
+       if (i->buf.allocation_failure)
+               return -ENOMEM;
+
+       return i->ret;
+}
+
+static const struct file_operations btree_updates_ops = {
+       .owner          = THIS_MODULE,
+       .open           = bch2_dump_open,
+       .release        = bch2_dump_release,
+       .read           = bch2_btree_updates_read,
+};
+
 static int btree_transaction_stats_open(struct inode *inode, struct file *file)
 {
        struct bch_fs *c = inode->i_private;
        struct dump_iter *i;
 
        i = kzalloc(sizeof(struct dump_iter), GFP_KERNEL);
-
        if (!i)
                return -ENOMEM;
 
@@ -866,6 +902,20 @@ void bch2_fs_debug_exit(struct bch_fs *c)
                debugfs_remove_recursive(c->fs_debug_dir);
 }
 
+static void bch2_fs_debug_btree_init(struct bch_fs *c, struct btree_debug *bd)
+{
+       struct dentry *d;
+
+       d = debugfs_create_dir(bch2_btree_id_str(bd->id), c->btree_debug_dir);
+
+       debugfs_create_file("keys", 0400, d, bd, &btree_debug_ops);
+
+       debugfs_create_file("formats", 0400, d, bd, &btree_format_debug_ops);
+
+       debugfs_create_file("bfloat-failed", 0400, d, bd,
+                           &bfloat_failed_debug_ops);
+}
+
 void bch2_fs_debug_init(struct bch_fs *c)
 {
        struct btree_debug *bd;
@@ -888,6 +938,9 @@ void bch2_fs_debug_init(struct bch_fs *c)
        debugfs_create_file("journal_pins", 0400, c->fs_debug_dir,
                            c->btree_debug, &journal_pins_ops);
 
+       debugfs_create_file("btree_updates", 0400, c->fs_debug_dir,
+                           c->btree_debug, &btree_updates_ops);
+
        debugfs_create_file("btree_transaction_stats", 0400, c->fs_debug_dir,
                            c, &btree_transaction_stats_op);
 
@@ -902,21 +955,7 @@ void bch2_fs_debug_init(struct bch_fs *c)
             bd < c->btree_debug + ARRAY_SIZE(c->btree_debug);
             bd++) {
                bd->id = bd - c->btree_debug;
-               debugfs_create_file(bch2_btree_id_str(bd->id),
-                                   0400, c->btree_debug_dir, bd,
-                                   &btree_debug_ops);
-
-               snprintf(name, sizeof(name), "%s-formats",
-                        bch2_btree_id_str(bd->id));
-
-               debugfs_create_file(name, 0400, c->btree_debug_dir, bd,
-                                   &btree_format_debug_ops);
-
-               snprintf(name, sizeof(name), "%s-bfloat-failed",
-                        bch2_btree_id_str(bd->id));
-
-               debugfs_create_file(name, 0400, c->btree_debug_dir, bd,
-                                   &bfloat_failed_debug_ops);
+               bch2_fs_debug_btree_init(c, bd);
        }
 }
 
index 4ce5e957a6e9162307d98b5b74b02338087f7e1e..0f955c3c76a7bcdce86556e4d09ba0e5cf4e7f9a 100644 (file)
@@ -115,7 +115,7 @@ static void swap_bytes(void *a, void *b, size_t n)
 
 struct wrapper {
        cmp_func_t cmp;
-       swap_func_t swap;
+       swap_func_t swap_func;
 };
 
 /*
@@ -125,7 +125,7 @@ struct wrapper {
 static void do_swap(void *a, void *b, size_t size, swap_r_func_t swap_func, const void *priv)
 {
        if (swap_func == SWAP_WRAPPER) {
-               ((const struct wrapper *)priv)->swap(a, b, (int)size);
+               ((const struct wrapper *)priv)->swap_func(a, b, (int)size);
                return;
        }
 
@@ -174,7 +174,7 @@ void eytzinger0_sort_r(void *base, size_t n, size_t size,
        int i, c, r;
 
        /* called from 'sort' without swap function, let's pick the default */
-       if (swap_func == SWAP_WRAPPER && !((struct wrapper *)priv)->swap)
+       if (swap_func == SWAP_WRAPPER && !((struct wrapper *)priv)->swap_func)
                swap_func = NULL;
 
        if (!swap_func) {
@@ -227,7 +227,7 @@ void eytzinger0_sort(void *base, size_t n, size_t size,
 {
        struct wrapper w = {
                .cmp  = cmp_func,
-               .swap = swap_func,
+               .swap_func = swap_func,
        };
 
        return eytzinger0_sort_r(base, n, size, _CMP_WRAPPER, SWAP_WRAPPER, &w);
index ee0e2df33322d2dccb60e1ed90257863769ead0d..24840aee335c0ffeabd3ad69c79665cc005e28d8 100644 (file)
@@ -242,8 +242,8 @@ static inline unsigned inorder_to_eytzinger0(unsigned i, unsigned size)
             (_i) = eytzinger0_next((_i), (_size)))
 
 /* return greatest node <= @search, or -1 if not found */
-static inline ssize_t eytzinger0_find_le(void *base, size_t nr, size_t size,
-                                        cmp_func_t cmp, const void *search)
+static inline int eytzinger0_find_le(void *base, size_t nr, size_t size,
+                                    cmp_func_t cmp, const void *search)
 {
        unsigned i, n = 0;
 
@@ -256,18 +256,32 @@ static inline ssize_t eytzinger0_find_le(void *base, size_t nr, size_t size,
        } while (n < nr);
 
        if (n & 1) {
-               /* @i was greater than @search, return previous node: */
+               /*
+                * @i was greater than @search, return previous node:
+                *
+                * if @i was leftmost/smallest element,
+                * eytzinger0_prev(eytzinger0_first())) returns -1, as expected
+                */
                return eytzinger0_prev(i, nr);
        } else {
                return i;
        }
 }
 
-static inline ssize_t eytzinger0_find_gt(void *base, size_t nr, size_t size,
-                                        cmp_func_t cmp, const void *search)
+static inline int eytzinger0_find_gt(void *base, size_t nr, size_t size,
+                                    cmp_func_t cmp, const void *search)
 {
        ssize_t idx = eytzinger0_find_le(base, nr, size, cmp, search);
-       return eytzinger0_next(idx, size);
+
+       /*
+        * if eytitzinger0_find_le() returned -1 - no element was <= search - we
+        * want to return the first element; next/prev identities mean this work
+        * as expected
+        *
+        * similarly if find_le() returns last element, we should return -1;
+        * identities mean this all works out:
+        */
+       return eytzinger0_next(idx, nr);
 }
 
 #define eytzinger0_find(base, nr, size, _cmp, search)                  \
index ab811c0dad26accfb4924eaef4cccb3ab957087c..04a577848b015cd900a1a040ec0565ffb2f69811 100644 (file)
@@ -67,6 +67,8 @@ void bch2_journal_set_watermark(struct journal *j)
            track_event_change(&c->times[BCH_TIME_blocked_write_buffer_full], low_on_wb))
                trace_and_count(c, journal_full, c);
 
+       mod_bit(JOURNAL_SPACE_LOW, &j->flags, low_on_space || low_on_pin);
+
        swap(watermark, j->watermark);
        if (watermark > j->watermark)
                journal_wake(j);
index 8c053cb64ca5ee25b9a5b2613f2fcd9e03d517d3..b5161b5d76a00874ed9ed88a0969927f2cfc9dbe 100644 (file)
@@ -134,6 +134,7 @@ enum journal_flags {
        JOURNAL_STARTED,
        JOURNAL_MAY_SKIP_FLUSH,
        JOURNAL_NEED_FLUSH_WRITE,
+       JOURNAL_SPACE_LOW,
 };
 
 /* Reasons we may fail to get a journal reservation: */
index b76c16152579c6d3e5a51dbf54c839392c0ce0b2..0f328aba9760ba0e89fd015ee757239b6d8bd8c4 100644 (file)
@@ -47,20 +47,6 @@ void bch2_btree_lost_data(struct bch_fs *c, enum btree_id btree)
        }
 }
 
-static bool btree_id_is_alloc(enum btree_id id)
-{
-       switch (id) {
-       case BTREE_ID_alloc:
-       case BTREE_ID_backpointers:
-       case BTREE_ID_need_discard:
-       case BTREE_ID_freespace:
-       case BTREE_ID_bucket_gens:
-               return true;
-       default:
-               return false;
-       }
-}
-
 /* for -o reconstruct_alloc: */
 static void bch2_reconstruct_alloc(struct bch_fs *c)
 {
index 0e806f04f3d7c5117ade3d612b1c851da243aead..544322d5c2517070143d367fa15d4ff353642556 100644 (file)
@@ -125,6 +125,15 @@ static inline u32 get_ancestor_below(struct snapshot_table *t, u32 id, u32 ances
        return s->parent;
 }
 
+static bool test_ancestor_bitmap(struct snapshot_table *t, u32 id, u32 ancestor)
+{
+       const struct snapshot_t *s = __snapshot_t(t, id);
+       if (!s)
+               return false;
+
+       return test_bit(ancestor - id - 1, s->is_ancestor);
+}
+
 bool __bch2_snapshot_is_ancestor(struct bch_fs *c, u32 id, u32 ancestor)
 {
        bool ret;
@@ -140,13 +149,11 @@ bool __bch2_snapshot_is_ancestor(struct bch_fs *c, u32 id, u32 ancestor)
        while (id && id < ancestor - IS_ANCESTOR_BITMAP)
                id = get_ancestor_below(t, id, ancestor);
 
-       if (id && id < ancestor) {
-               ret = test_bit(ancestor - id - 1, __snapshot_t(t, id)->is_ancestor);
+       ret = id && id < ancestor
+               ? test_ancestor_bitmap(t, id, ancestor)
+               : id == ancestor;
 
-               EBUG_ON(ret != __bch2_snapshot_is_ancestor_early(t, id, ancestor));
-       } else {
-               ret = id == ancestor;
-       }
+       EBUG_ON(ret != __bch2_snapshot_is_ancestor_early(t, id, ancestor));
 out:
        rcu_read_unlock();
 
index e0aa3655b63b4cd7ca8dd2ee03e0370474427ab3..5eee055ee2721a3967fb31ca38adcbc5672521d6 100644 (file)
@@ -143,7 +143,7 @@ void bch2_free_super(struct bch_sb_handle *sb)
 {
        kfree(sb->bio);
        if (!IS_ERR_OR_NULL(sb->s_bdev_file))
-               fput(sb->s_bdev_file);
+               bdev_fput(sb->s_bdev_file);
        kfree(sb->holder);
        kfree(sb->sb_name);
 
index ed63018f21bef58b2aa854f9c3f05ad1b3f26202..8daf80a38d60c6e4fa97b97345d3d4ecb80e7e88 100644 (file)
@@ -288,8 +288,13 @@ static void __bch2_fs_read_only(struct bch_fs *c)
        if (test_bit(JOURNAL_REPLAY_DONE, &c->journal.flags) &&
            !test_bit(BCH_FS_emergency_ro, &c->flags))
                set_bit(BCH_FS_clean_shutdown, &c->flags);
+
        bch2_fs_journal_stop(&c->journal);
 
+       bch_info(c, "%sshutdown complete, journal seq %llu",
+                test_bit(BCH_FS_clean_shutdown, &c->flags) ? "" : "un",
+                c->journal.seq_ondisk);
+
        /*
         * After stopping journal:
         */
index c86a93a8d8fc81bbe373efcbec74f3e2563e6da5..b18b0cc81b594ad6144b43599418418a1caf5e95 100644 (file)
@@ -17,7 +17,6 @@
 #include "btree_iter.h"
 #include "btree_key_cache.h"
 #include "btree_update.h"
-#include "btree_update_interior.h"
 #include "btree_gc.h"
 #include "buckets.h"
 #include "clock.h"
@@ -166,7 +165,6 @@ read_attribute(btree_write_stats);
 read_attribute(btree_cache_size);
 read_attribute(compression_stats);
 read_attribute(journal_debug);
-read_attribute(btree_updates);
 read_attribute(btree_cache);
 read_attribute(btree_key_cache);
 read_attribute(stripes_heap);
@@ -415,9 +413,6 @@ SHOW(bch2_fs)
        if (attr == &sysfs_journal_debug)
                bch2_journal_debug_to_text(out, &c->journal);
 
-       if (attr == &sysfs_btree_updates)
-               bch2_btree_updates_to_text(out, c);
-
        if (attr == &sysfs_btree_cache)
                bch2_btree_cache_to_text(out, c);
 
@@ -639,7 +634,6 @@ SYSFS_OPS(bch2_fs_internal);
 struct attribute *bch2_fs_internal_files[] = {
        &sysfs_flags,
        &sysfs_journal_debug,
-       &sysfs_btree_updates,
        &sysfs_btree_cache,
        &sysfs_btree_key_cache,
        &sysfs_new_stripes,
index b3fe9fc577470ff14659df531959c9e7aa6c324b..bfec656f94c0758ee081ea7d36fe1e272baca810 100644 (file)
@@ -672,7 +672,7 @@ static int __do_delete(struct btree_trans *trans, struct bpos pos)
 
        bch2_trans_iter_init(trans, &iter, BTREE_ID_xattrs, pos,
                             BTREE_ITER_INTENT);
-       k = bch2_btree_iter_peek(&iter);
+       k = bch2_btree_iter_peek_upto(&iter, POS(0, U64_MAX));
        ret = bkey_err(k);
        if (ret)
                goto err;
index b7e7c29278fc052a90fe7c029e8fd0626c48ddc5..5cf885b09986ac95effa15f7a37fc78bd56323cb 100644 (file)
@@ -788,6 +788,14 @@ static inline int copy_from_user_errcode(void *to, const void __user *from, unsi
 
 #endif
 
+static inline void mod_bit(long nr, volatile unsigned long *addr, bool v)
+{
+       if (v)
+               set_bit(nr, addr);
+       else
+               clear_bit(nr, addr);
+}
+
 static inline void __set_bit_le64(size_t bit, __le64 *addr)
 {
        addr[bit / 64] |= cpu_to_le64(BIT_ULL(bit % 64));
@@ -795,7 +803,7 @@ static inline void __set_bit_le64(size_t bit, __le64 *addr)
 
 static inline void __clear_bit_le64(size_t bit, __le64 *addr)
 {
-       addr[bit / 64] &= !cpu_to_le64(BIT_ULL(bit % 64));
+       addr[bit / 64] &= ~cpu_to_le64(BIT_ULL(bit % 64));
 }
 
 static inline bool test_bit_le64(size_t bit, __le64 *addr)
index dd6f566a383f00e83c9f36125ee4ecd3fc0e3541..121ab890bd0557e4779bd25d00dc422ba8fb1b3f 100644 (file)
@@ -1133,6 +1133,9 @@ __btrfs_commit_inode_delayed_items(struct btrfs_trans_handle *trans,
        if (ret)
                return ret;
 
+       ret = btrfs_record_root_in_trans(trans, node->root);
+       if (ret)
+               return ret;
        ret = btrfs_update_delayed_inode(trans, node->root, path, node);
        return ret;
 }
index 37701531eeb1ba486cd8117f104794083dff8816..c65fe5de40220d3b51003bb73b3e6414eaefba08 100644 (file)
@@ -2533,7 +2533,7 @@ void btrfs_clear_delalloc_extent(struct btrfs_inode *inode,
                 */
                if (bits & EXTENT_CLEAR_META_RESV &&
                    root != fs_info->tree_root)
-                       btrfs_delalloc_release_metadata(inode, len, false);
+                       btrfs_delalloc_release_metadata(inode, len, true);
 
                /* For sanity tests. */
                if (btrfs_is_testing(fs_info))
@@ -4503,6 +4503,7 @@ int btrfs_delete_subvolume(struct btrfs_inode *dir, struct dentry *dentry)
        struct btrfs_trans_handle *trans;
        struct btrfs_block_rsv block_rsv;
        u64 root_flags;
+       u64 qgroup_reserved = 0;
        int ret;
 
        down_write(&fs_info->subvol_sem);
@@ -4547,12 +4548,20 @@ int btrfs_delete_subvolume(struct btrfs_inode *dir, struct dentry *dentry)
        ret = btrfs_subvolume_reserve_metadata(root, &block_rsv, 5, true);
        if (ret)
                goto out_undead;
+       qgroup_reserved = block_rsv.qgroup_rsv_reserved;
 
        trans = btrfs_start_transaction(root, 0);
        if (IS_ERR(trans)) {
                ret = PTR_ERR(trans);
                goto out_release;
        }
+       ret = btrfs_record_root_in_trans(trans, root);
+       if (ret) {
+               btrfs_abort_transaction(trans, ret);
+               goto out_end_trans;
+       }
+       btrfs_qgroup_convert_reserved_meta(root, qgroup_reserved);
+       qgroup_reserved = 0;
        trans->block_rsv = &block_rsv;
        trans->bytes_reserved = block_rsv.size;
 
@@ -4611,7 +4620,9 @@ out_end_trans:
        ret = btrfs_end_transaction(trans);
        inode->i_flags |= S_DEAD;
 out_release:
-       btrfs_subvolume_release_metadata(root, &block_rsv);
+       btrfs_block_rsv_release(fs_info, &block_rsv, (u64)-1, NULL);
+       if (qgroup_reserved)
+               btrfs_qgroup_free_meta_prealloc(root, qgroup_reserved);
 out_undead:
        if (ret) {
                spin_lock(&dest->root_item_lock);
index 294e31edec9d3bbe566e9234c8ef76d73612adbc..55f3ba6a831ca194e2d8405dbf7caa60fbd81dfc 100644 (file)
@@ -613,6 +613,7 @@ static noinline int create_subvol(struct mnt_idmap *idmap,
        int ret;
        dev_t anon_dev;
        u64 objectid;
+       u64 qgroup_reserved = 0;
 
        root_item = kzalloc(sizeof(*root_item), GFP_KERNEL);
        if (!root_item)
@@ -650,13 +651,18 @@ static noinline int create_subvol(struct mnt_idmap *idmap,
                                               trans_num_items, false);
        if (ret)
                goto out_new_inode_args;
+       qgroup_reserved = block_rsv.qgroup_rsv_reserved;
 
        trans = btrfs_start_transaction(root, 0);
        if (IS_ERR(trans)) {
                ret = PTR_ERR(trans);
-               btrfs_subvolume_release_metadata(root, &block_rsv);
-               goto out_new_inode_args;
+               goto out_release_rsv;
        }
+       ret = btrfs_record_root_in_trans(trans, BTRFS_I(dir)->root);
+       if (ret)
+               goto out;
+       btrfs_qgroup_convert_reserved_meta(root, qgroup_reserved);
+       qgroup_reserved = 0;
        trans->block_rsv = &block_rsv;
        trans->bytes_reserved = block_rsv.size;
        /* Tree log can't currently deal with an inode which is a new root. */
@@ -767,9 +773,11 @@ static noinline int create_subvol(struct mnt_idmap *idmap,
 out:
        trans->block_rsv = NULL;
        trans->bytes_reserved = 0;
-       btrfs_subvolume_release_metadata(root, &block_rsv);
-
        btrfs_end_transaction(trans);
+out_release_rsv:
+       btrfs_block_rsv_release(fs_info, &block_rsv, (u64)-1, NULL);
+       if (qgroup_reserved)
+               btrfs_qgroup_free_meta_prealloc(root, qgroup_reserved);
 out_new_inode_args:
        btrfs_new_inode_args_destroy(&new_inode_args);
 out_inode:
@@ -791,6 +799,8 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir,
        struct btrfs_pending_snapshot *pending_snapshot;
        unsigned int trans_num_items;
        struct btrfs_trans_handle *trans;
+       struct btrfs_block_rsv *block_rsv;
+       u64 qgroup_reserved = 0;
        int ret;
 
        /* We do not support snapshotting right now. */
@@ -827,19 +837,19 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir,
                goto free_pending;
        }
 
-       btrfs_init_block_rsv(&pending_snapshot->block_rsv,
-                            BTRFS_BLOCK_RSV_TEMP);
+       block_rsv = &pending_snapshot->block_rsv;
+       btrfs_init_block_rsv(block_rsv, BTRFS_BLOCK_RSV_TEMP);
        /*
         * 1 to add dir item
         * 1 to add dir index
         * 1 to update parent inode item
         */
        trans_num_items = create_subvol_num_items(inherit) + 3;
-       ret = btrfs_subvolume_reserve_metadata(BTRFS_I(dir)->root,
-                                              &pending_snapshot->block_rsv,
+       ret = btrfs_subvolume_reserve_metadata(BTRFS_I(dir)->root, block_rsv,
                                               trans_num_items, false);
        if (ret)
                goto free_pending;
+       qgroup_reserved = block_rsv->qgroup_rsv_reserved;
 
        pending_snapshot->dentry = dentry;
        pending_snapshot->root = root;
@@ -852,6 +862,13 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir,
                ret = PTR_ERR(trans);
                goto fail;
        }
+       ret = btrfs_record_root_in_trans(trans, BTRFS_I(dir)->root);
+       if (ret) {
+               btrfs_end_transaction(trans);
+               goto fail;
+       }
+       btrfs_qgroup_convert_reserved_meta(root, qgroup_reserved);
+       qgroup_reserved = 0;
 
        trans->pending_snapshot = pending_snapshot;
 
@@ -881,7 +898,9 @@ fail:
        if (ret && pending_snapshot->snap)
                pending_snapshot->snap->anon_dev = 0;
        btrfs_put_root(pending_snapshot->snap);
-       btrfs_subvolume_release_metadata(root, &pending_snapshot->block_rsv);
+       btrfs_block_rsv_release(fs_info, block_rsv, (u64)-1, NULL);
+       if (qgroup_reserved)
+               btrfs_qgroup_free_meta_prealloc(root, qgroup_reserved);
 free_pending:
        if (pending_snapshot->anon_dev)
                free_anon_bdev(pending_snapshot->anon_dev);
index 5f90f0605b12f7126e93d69e7fbec42720301fad..cf8820ce7aa2979920c6daafc1071c26571ecee6 100644 (file)
@@ -4495,6 +4495,8 @@ void btrfs_qgroup_convert_reserved_meta(struct btrfs_root *root, int num_bytes)
                                      BTRFS_QGROUP_RSV_META_PREALLOC);
        trace_qgroup_meta_convert(root, num_bytes);
        qgroup_convert_meta(fs_info, root->root_key.objectid, num_bytes);
+       if (!sb_rdonly(fs_info->sb))
+               add_root_meta_rsv(root, num_bytes, BTRFS_QGROUP_RSV_META_PERTRANS);
 }
 
 /*
index 4bb538a372ce56404de84d6ddbca7fb951715949..7007f9e0c97282bc5f415f56d14e02e79895aafc 100644 (file)
@@ -548,13 +548,3 @@ int btrfs_subvolume_reserve_metadata(struct btrfs_root *root,
        }
        return ret;
 }
-
-void btrfs_subvolume_release_metadata(struct btrfs_root *root,
-                                     struct btrfs_block_rsv *rsv)
-{
-       struct btrfs_fs_info *fs_info = root->fs_info;
-       u64 qgroup_to_release;
-
-       btrfs_block_rsv_release(fs_info, rsv, (u64)-1, &qgroup_to_release);
-       btrfs_qgroup_convert_reserved_meta(root, qgroup_to_release);
-}
index 6f929cf3bd4967560964659ee9f631e6766a07ab..8f5739e732b9b6c9cc1d47ee34e20d50a403d90c 100644 (file)
@@ -18,8 +18,6 @@ struct btrfs_trans_handle;
 int btrfs_subvolume_reserve_metadata(struct btrfs_root *root,
                                     struct btrfs_block_rsv *rsv,
                                     int nitems, bool use_global_rsv);
-void btrfs_subvolume_release_metadata(struct btrfs_root *root,
-                                     struct btrfs_block_rsv *rsv);
 int btrfs_add_root_ref(struct btrfs_trans_handle *trans, u64 root_id,
                       u64 ref_id, u64 dirid, u64 sequence,
                       const struct fscrypt_str *name);
index 46e8426adf4f15768507303430b38c9e6be56c7d..85f359e0e0a7f2ea078157c85a1f78b0ea2bcadd 100644 (file)
@@ -745,14 +745,6 @@ again:
                h->reloc_reserved = reloc_reserved;
        }
 
-       /*
-        * Now that we have found a transaction to be a part of, convert the
-        * qgroup reservation from prealloc to pertrans. A different transaction
-        * can't race in and free our pertrans out from under us.
-        */
-       if (qgroup_reserved)
-               btrfs_qgroup_convert_reserved_meta(root, qgroup_reserved);
-
 got_it:
        if (!current->journal_info)
                current->journal_info = h;
@@ -786,8 +778,15 @@ got_it:
                 * not just freed.
                 */
                btrfs_end_transaction(h);
-               return ERR_PTR(ret);
+               goto reserve_fail;
        }
+       /*
+        * Now that we have found a transaction to be a part of, convert the
+        * qgroup reservation from prealloc to pertrans. A different transaction
+        * can't race in and free our pertrans out from under us.
+        */
+       if (qgroup_reserved)
+               btrfs_qgroup_convert_reserved_meta(root, qgroup_reserved);
 
        return h;
 
@@ -1495,6 +1494,7 @@ static noinline int commit_fs_roots(struct btrfs_trans_handle *trans)
                        radix_tree_tag_clear(&fs_info->fs_roots_radix,
                                        (unsigned long)root->root_key.objectid,
                                        BTRFS_ROOT_TRANS_TAG);
+                       btrfs_qgroup_free_meta_all_pertrans(root);
                        spin_unlock(&fs_info->fs_roots_radix_lock);
 
                        btrfs_free_log(trans, root);
@@ -1519,7 +1519,6 @@ static noinline int commit_fs_roots(struct btrfs_trans_handle *trans)
                        if (ret2)
                                return ret2;
                        spin_lock(&fs_info->fs_roots_radix_lock);
-                       btrfs_qgroup_free_meta_all_pertrans(root);
                }
        }
        spin_unlock(&fs_info->fs_roots_radix_lock);
index 39e75131fd5aa01d732f703cb1f421a3696bffd6..9901057a15ba79a110c8a90423bc7707102590d8 100644 (file)
@@ -495,7 +495,7 @@ static void cramfs_kill_sb(struct super_block *sb)
                sb->s_mtd = NULL;
        } else if (IS_ENABLED(CONFIG_CRAMFS_BLOCKDEV) && sb->s_bdev) {
                sync_blockdev(sb->s_bdev);
-               fput(sb->s_bdev_file);
+               bdev_fput(sb->s_bdev_file);
        }
        kfree(sbi);
 }
index cfb8449c731f9ac53fb3add808e13493175508c4..044135796f2b6ebe86e56b69f57501e7567d761b 100644 (file)
@@ -5668,7 +5668,7 @@ failed_mount:
        brelse(sbi->s_sbh);
        if (sbi->s_journal_bdev_file) {
                invalidate_bdev(file_bdev(sbi->s_journal_bdev_file));
-               fput(sbi->s_journal_bdev_file);
+               bdev_fput(sbi->s_journal_bdev_file);
        }
 out_fail:
        invalidate_bdev(sb->s_bdev);
@@ -5913,7 +5913,7 @@ static struct file *ext4_get_journal_blkdev(struct super_block *sb,
 out_bh:
        brelse(bh);
 out_bdev:
-       fput(bdev_file);
+       bdev_fput(bdev_file);
        return ERR_PTR(errno);
 }
 
@@ -5952,7 +5952,7 @@ static journal_t *ext4_open_dev_journal(struct super_block *sb,
 out_journal:
        jbd2_journal_destroy(journal);
 out_bdev:
-       fput(bdev_file);
+       bdev_fput(bdev_file);
        return ERR_PTR(errno);
 }
 
@@ -7327,7 +7327,7 @@ static void ext4_kill_sb(struct super_block *sb)
        kill_block_super(sb);
 
        if (bdev_file)
-               fput(bdev_file);
+               bdev_fput(bdev_file);
 }
 
 static struct file_system_type ext4_fs_type = {
index a6867f26f141836dcd4a4f0136dd67a9de6c3c74..a4bc26dfdb1af5973783d2817bf2deed889f3c33 100644 (file)
@@ -1558,7 +1558,7 @@ static void destroy_device_list(struct f2fs_sb_info *sbi)
 
        for (i = 0; i < sbi->s_ndevs; i++) {
                if (i > 0)
-                       fput(FDEV(i).bdev_file);
+                       bdev_fput(FDEV(i).bdev_file);
 #ifdef CONFIG_BLK_DEV_ZONED
                kvfree(FDEV(i).blkz_seq);
 #endif
index 73389c68e25170c81d6f84483f09b43216ba4b52..9609349e92e5e1ba422369fa29a2f6345f7fe908 100644 (file)
@@ -1141,7 +1141,7 @@ journal_found:
        lbmLogShutdown(log);
 
       close:           /* close external log device */
-       fput(bdev_file);
+       bdev_fput(bdev_file);
 
       free:            /* free log descriptor */
        mutex_unlock(&jfs_log_mutex);
@@ -1485,7 +1485,7 @@ int lmLogClose(struct super_block *sb)
        bdev_file = log->bdev_file;
        rc = lmLogShutdown(log);
 
-       fput(bdev_file);
+       bdev_fput(bdev_file);
 
        kfree(log);
 
index 2391ab3c3231975bde1606875339b7736975054f..84d4093ca71317ebb7a70bde76819704e25ec7dc 100644 (file)
@@ -3042,12 +3042,9 @@ static void
 nfsd4_cb_recall_any_release(struct nfsd4_callback *cb)
 {
        struct nfs4_client *clp = cb->cb_clp;
-       struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id);
 
-       spin_lock(&nn->client_lock);
        clear_bit(NFSD4_CLIENT_CB_RECALL_ANY, &clp->cl_flags);
-       put_client_renew_locked(clp);
-       spin_unlock(&nn->client_lock);
+       drop_client(clp);
 }
 
 static int
@@ -6616,7 +6613,7 @@ deleg_reaper(struct nfsd_net *nn)
                list_add(&clp->cl_ra_cblist, &cblist);
 
                /* release in nfsd4_cb_recall_any_release */
-               atomic_inc(&clp->cl_rpc_users);
+               kref_get(&clp->cl_nfsdfs.cl_ref);
                set_bit(NFSD4_CLIENT_CB_RECALL_ANY, &clp->cl_flags);
                clp->cl_ra_time = ktime_get_boottime_seconds();
        }
index 902b326e1e5607d5537721b51f68c28e602e2b92..87dcaae32ff87b40c3d65a0d2463a6e60cdc0dbc 100644 (file)
@@ -62,12 +62,12 @@ static int __init copy_xbc_key_value_list(char *dst, size_t size)
                                break;
                        dst += ret;
                }
-               if (ret >= 0 && boot_command_line[0]) {
-                       ret = snprintf(dst, rest(dst, end), "# Parameters from bootloader:\n# %s\n",
-                                      boot_command_line);
-                       if (ret > 0)
-                               dst += ret;
-               }
+       }
+       if (cmdline_has_extra_options() && ret >= 0 && boot_command_line[0]) {
+               ret = snprintf(dst, rest(dst, end), "# Parameters from bootloader:\n# %s\n",
+                              boot_command_line);
+               if (ret > 0)
+                       dst += ret;
        }
 out:
        kfree(key);
index 6474529c42530628fd3969573fb175283f4f51e8..e539ccd39e1ee74cd8bdfd35d29f826be6f514e1 100644 (file)
@@ -2589,7 +2589,7 @@ static void journal_list_init(struct super_block *sb)
 static void release_journal_dev(struct reiserfs_journal *journal)
 {
        if (journal->j_bdev_file) {
-               fput(journal->j_bdev_file);
+               bdev_fput(journal->j_bdev_file);
                journal->j_bdev_file = NULL;
        }
 }
index 2be227532f399788de82a03e55970d33c67dc695..2cbb924620747f68d04ac53783c3b0f21c5ea0ab 100644 (file)
@@ -594,7 +594,7 @@ static void romfs_kill_sb(struct super_block *sb)
 #ifdef CONFIG_ROMFS_ON_BLOCK
        if (sb->s_bdev) {
                sync_blockdev(sb->s_bdev);
-               fput(sb->s_bdev_file);
+               bdev_fput(sb->s_bdev_file);
        }
 #endif
 }
index a0017724d5239312c14644bd15d2867880337d91..13a9d7acf8f8ec151323d18d44a346e060bf0ce2 100644 (file)
@@ -417,6 +417,7 @@ smb2_close_cached_fid(struct kref *ref)
 {
        struct cached_fid *cfid = container_of(ref, struct cached_fid,
                                               refcount);
+       int rc;
 
        spin_lock(&cfid->cfids->cfid_list_lock);
        if (cfid->on_list) {
@@ -430,9 +431,10 @@ smb2_close_cached_fid(struct kref *ref)
        cfid->dentry = NULL;
 
        if (cfid->is_open) {
-               SMB2_close(0, cfid->tcon, cfid->fid.persistent_fid,
+               rc = SMB2_close(0, cfid->tcon, cfid->fid.persistent_fid,
                           cfid->fid.volatile_fid);
-               atomic_dec(&cfid->tcon->num_remote_opens);
+               if (rc != -EBUSY && rc != -EAGAIN)
+                       atomic_dec(&cfid->tcon->num_remote_opens);
        }
 
        free_cached_dir(cfid);
index 226d4835c92db8ba3f1f0540a16643d0b8ac3fd0..c71ae5c043060ebf5dd7f6d9e5f63e6e7bcf7841 100644 (file)
@@ -250,6 +250,8 @@ static int cifs_debug_files_proc_show(struct seq_file *m, void *v)
        spin_lock(&cifs_tcp_ses_lock);
        list_for_each_entry(server, &cifs_tcp_ses_list, tcp_ses_list) {
                list_for_each_entry(ses, &server->smb_ses_list, smb_ses_list) {
+                       if (cifs_ses_exiting(ses))
+                               continue;
                        list_for_each_entry(tcon, &ses->tcon_list, tcon_list) {
                                spin_lock(&tcon->open_file_lock);
                                list_for_each_entry(cfile, &tcon->openFileList, tlist) {
@@ -676,6 +678,8 @@ static ssize_t cifs_stats_proc_write(struct file *file,
                        }
 #endif /* CONFIG_CIFS_STATS2 */
                        list_for_each_entry(ses, &server->smb_ses_list, smb_ses_list) {
+                               if (cifs_ses_exiting(ses))
+                                       continue;
                                list_for_each_entry(tcon, &ses->tcon_list, tcon_list) {
                                        atomic_set(&tcon->num_smbs_sent, 0);
                                        spin_lock(&tcon->stat_lock);
@@ -755,6 +759,8 @@ static int cifs_stats_proc_show(struct seq_file *m, void *v)
                        }
 #endif /* STATS2 */
                list_for_each_entry(ses, &server->smb_ses_list, smb_ses_list) {
+                       if (cifs_ses_exiting(ses))
+                               continue;
                        list_for_each_entry(tcon, &ses->tcon_list, tcon_list) {
                                i++;
                                seq_printf(m, "\n%d) %s", i, tcon->tree_name);
index aa6f1ecb7c0e8fc11f9b1fb830d8c0ca8071b631..d41eedbff674abb0e62e52ae6cc585aaa5d83d77 100644 (file)
@@ -156,6 +156,7 @@ struct workqueue_struct     *decrypt_wq;
 struct workqueue_struct        *fileinfo_put_wq;
 struct workqueue_struct        *cifsoplockd_wq;
 struct workqueue_struct        *deferredclose_wq;
+struct workqueue_struct        *serverclose_wq;
 __u32 cifs_lock_secret;
 
 /*
@@ -1888,6 +1889,13 @@ init_cifs(void)
                goto out_destroy_cifsoplockd_wq;
        }
 
+       serverclose_wq = alloc_workqueue("serverclose",
+                                          WQ_FREEZABLE|WQ_MEM_RECLAIM, 0);
+       if (!serverclose_wq) {
+               rc = -ENOMEM;
+               goto out_destroy_serverclose_wq;
+       }
+
        rc = cifs_init_inodecache();
        if (rc)
                goto out_destroy_deferredclose_wq;
@@ -1962,6 +1970,8 @@ out_destroy_decrypt_wq:
        destroy_workqueue(decrypt_wq);
 out_destroy_cifsiod_wq:
        destroy_workqueue(cifsiod_wq);
+out_destroy_serverclose_wq:
+       destroy_workqueue(serverclose_wq);
 out_clean_proc:
        cifs_proc_clean();
        return rc;
@@ -1991,6 +2001,7 @@ exit_cifs(void)
        destroy_workqueue(cifsoplockd_wq);
        destroy_workqueue(decrypt_wq);
        destroy_workqueue(fileinfo_put_wq);
+       destroy_workqueue(serverclose_wq);
        destroy_workqueue(cifsiod_wq);
        cifs_proc_clean();
 }
index 7ed9d05f6890b4d40cb11a7b8d7384c6e0111461..f6a302205f89c456d9fa3adb3dae238deeb97d10 100644 (file)
@@ -442,10 +442,10 @@ struct smb_version_operations {
        /* set fid protocol-specific info */
        void (*set_fid)(struct cifsFileInfo *, struct cifs_fid *, __u32);
        /* close a file */
-       void (*close)(const unsigned int, struct cifs_tcon *,
+       int (*close)(const unsigned int, struct cifs_tcon *,
                      struct cifs_fid *);
        /* close a file, returning file attributes and timestamps */
-       void (*close_getattr)(const unsigned int xid, struct cifs_tcon *tcon,
+       int (*close_getattr)(const unsigned int xid, struct cifs_tcon *tcon,
                      struct cifsFileInfo *pfile_info);
        /* send a flush request to the server */
        int (*flush)(const unsigned int, struct cifs_tcon *, struct cifs_fid *);
@@ -1281,7 +1281,6 @@ struct cifs_tcon {
        struct cached_fids *cfids;
        /* BB add field for back pointer to sb struct(s)? */
 #ifdef CONFIG_CIFS_DFS_UPCALL
-       struct list_head dfs_ses_list;
        struct delayed_work dfs_cache_work;
 #endif
        struct delayed_work     query_interfaces; /* query interfaces workqueue job */
@@ -1440,6 +1439,7 @@ struct cifsFileInfo {
        bool swapfile:1;
        bool oplock_break_cancelled:1;
        bool status_file_deleted:1; /* file has been deleted */
+       bool offload:1; /* offload final part of _put to a wq */
        unsigned int oplock_epoch; /* epoch from the lease break */
        __u32 oplock_level; /* oplock/lease level from the lease break */
        int count;
@@ -1448,6 +1448,7 @@ struct cifsFileInfo {
        struct cifs_search_info srch_inf;
        struct work_struct oplock_break; /* work for oplock breaks */
        struct work_struct put; /* work for the final part of _put */
+       struct work_struct serverclose; /* work for serverclose */
        struct delayed_work deferred;
        bool deferred_close_scheduled; /* Flag to indicate close is scheduled */
        char *symlink_target;
@@ -1804,7 +1805,6 @@ struct cifs_mount_ctx {
        struct TCP_Server_Info *server;
        struct cifs_ses *ses;
        struct cifs_tcon *tcon;
-       struct list_head dfs_ses_list;
 };
 
 static inline void __free_dfs_info_param(struct dfs_info3_param *param)
@@ -2105,6 +2105,7 @@ extern struct workqueue_struct *decrypt_wq;
 extern struct workqueue_struct *fileinfo_put_wq;
 extern struct workqueue_struct *cifsoplockd_wq;
 extern struct workqueue_struct *deferredclose_wq;
+extern struct workqueue_struct *serverclose_wq;
 extern __u32 cifs_lock_secret;
 
 extern mempool_t *cifs_sm_req_poolp;
@@ -2324,4 +2325,14 @@ struct smb2_compound_vars {
        struct kvec ea_iov;
 };
 
+static inline bool cifs_ses_exiting(struct cifs_ses *ses)
+{
+       bool ret;
+
+       spin_lock(&ses->ses_lock);
+       ret = ses->ses_status == SES_EXITING;
+       spin_unlock(&ses->ses_lock);
+       return ret;
+}
+
 #endif /* _CIFS_GLOB_H */
index 0723e1b57256b8fe0d07e0a4698d60074914ec38..8e0a348f1f660ebc14498c7fd7d342693411c106 100644 (file)
@@ -725,31 +725,31 @@ struct super_block *cifs_get_tcon_super(struct cifs_tcon *tcon);
 void cifs_put_tcon_super(struct super_block *sb);
 int cifs_wait_for_server_reconnect(struct TCP_Server_Info *server, bool retry);
 
-/* Put references of @ses and @ses->dfs_root_ses */
+/* Put references of @ses and its children */
 static inline void cifs_put_smb_ses(struct cifs_ses *ses)
 {
-       struct cifs_ses *rses = ses->dfs_root_ses;
+       struct cifs_ses *next;
 
-       __cifs_put_smb_ses(ses);
-       if (rses)
-               __cifs_put_smb_ses(rses);
+       do {
+               next = ses->dfs_root_ses;
+               __cifs_put_smb_ses(ses);
+       } while ((ses = next));
 }
 
-/* Get an active reference of @ses and @ses->dfs_root_ses.
+/* Get an active reference of @ses and its children.
  *
  * NOTE: make sure to call this function when incrementing reference count of
  * @ses to ensure that any DFS root session attached to it (@ses->dfs_root_ses)
  * will also get its reference count incremented.
  *
- * cifs_put_smb_ses() will put both references, so call it when you're done.
+ * cifs_put_smb_ses() will put all references, so call it when you're done.
  */
 static inline void cifs_smb_ses_inc_refcount(struct cifs_ses *ses)
 {
        lockdep_assert_held(&cifs_tcp_ses_lock);
 
-       ses->ses_count++;
-       if (ses->dfs_root_ses)
-               ses->dfs_root_ses->ses_count++;
+       for (; ses; ses = ses->dfs_root_ses)
+               ses->ses_count++;
 }
 
 static inline bool dfs_src_pathname_equal(const char *s1, const char *s2)
index 5aee555515730d78031734da95526ee729416216..23b5709ddc311c7366e33505528711363d151a14 100644 (file)
@@ -5854,10 +5854,8 @@ SetEARetry:
        parm_data->list.EA_flags = 0;
        /* we checked above that name len is less than 255 */
        parm_data->list.name_len = (__u8)name_len;
-       /* EA names are always ASCII */
-       if (ea_name)
-               strncpy(parm_data->list.name, ea_name, name_len);
-       parm_data->list.name[name_len] = '\0';
+       /* EA names are always ASCII and NUL-terminated */
+       strscpy(parm_data->list.name, ea_name ?: "", name_len + 1);
        parm_data->list.value_len = cpu_to_le16(ea_value_len);
        /* caller ensures that ea_value_len is less than 64K but
        we need to ensure that it fits within the smb */
index 9b85b5341822e73d9a4746a027f50121f01873c3..85679ae106fd50a4e3289349e2916204ae3f94fc 100644 (file)
@@ -175,6 +175,8 @@ cifs_signal_cifsd_for_reconnect(struct TCP_Server_Info *server,
 
        spin_lock(&cifs_tcp_ses_lock);
        list_for_each_entry(ses, &pserver->smb_ses_list, smb_ses_list) {
+               if (cifs_ses_exiting(ses))
+                       continue;
                spin_lock(&ses->chan_lock);
                for (i = 0; i < ses->chan_count; i++) {
                        if (!ses->chans[i].server)
@@ -232,7 +234,13 @@ cifs_mark_tcp_ses_conns_for_reconnect(struct TCP_Server_Info *server,
 
        spin_lock(&cifs_tcp_ses_lock);
        list_for_each_entry_safe(ses, nses, &pserver->smb_ses_list, smb_ses_list) {
-               /* check if iface is still active */
+               spin_lock(&ses->ses_lock);
+               if (ses->ses_status == SES_EXITING) {
+                       spin_unlock(&ses->ses_lock);
+                       continue;
+               }
+               spin_unlock(&ses->ses_lock);
+
                spin_lock(&ses->chan_lock);
                if (cifs_ses_get_chan_index(ses, server) ==
                    CIFS_INVAL_CHAN_INDEX) {
@@ -1860,6 +1868,9 @@ static int match_session(struct cifs_ses *ses, struct smb3_fs_context *ctx)
            ctx->sectype != ses->sectype)
                return 0;
 
+       if (ctx->dfs_root_ses != ses->dfs_root_ses)
+               return 0;
+
        /*
         * If an existing session is limited to less channels than
         * requested, it should not be reused
@@ -1963,31 +1974,6 @@ out:
        return rc;
 }
 
-/**
- * cifs_free_ipc - helper to release the session IPC tcon
- * @ses: smb session to unmount the IPC from
- *
- * Needs to be called everytime a session is destroyed.
- *
- * On session close, the IPC is closed and the server must release all tcons of the session.
- * No need to send a tree disconnect here.
- *
- * Besides, it will make the server to not close durable and resilient files on session close, as
- * specified in MS-SMB2 3.3.5.6 Receiving an SMB2 LOGOFF Request.
- */
-static int
-cifs_free_ipc(struct cifs_ses *ses)
-{
-       struct cifs_tcon *tcon = ses->tcon_ipc;
-
-       if (tcon == NULL)
-               return 0;
-
-       tconInfoFree(tcon);
-       ses->tcon_ipc = NULL;
-       return 0;
-}
-
 static struct cifs_ses *
 cifs_find_smb_ses(struct TCP_Server_Info *server, struct smb3_fs_context *ctx)
 {
@@ -2019,48 +2005,52 @@ cifs_find_smb_ses(struct TCP_Server_Info *server, struct smb3_fs_context *ctx)
 void __cifs_put_smb_ses(struct cifs_ses *ses)
 {
        struct TCP_Server_Info *server = ses->server;
+       struct cifs_tcon *tcon;
        unsigned int xid;
        size_t i;
+       bool do_logoff;
        int rc;
 
+       spin_lock(&cifs_tcp_ses_lock);
        spin_lock(&ses->ses_lock);
-       if (ses->ses_status == SES_EXITING) {
+       cifs_dbg(FYI, "%s: id=0x%llx ses_count=%d ses_status=%u ipc=%s\n",
+                __func__, ses->Suid, ses->ses_count, ses->ses_status,
+                ses->tcon_ipc ? ses->tcon_ipc->tree_name : "none");
+       if (ses->ses_status == SES_EXITING || --ses->ses_count > 0) {
                spin_unlock(&ses->ses_lock);
+               spin_unlock(&cifs_tcp_ses_lock);
                return;
        }
-       spin_unlock(&ses->ses_lock);
+       /* ses_count can never go negative */
+       WARN_ON(ses->ses_count < 0);
 
-       cifs_dbg(FYI, "%s: ses_count=%d\n", __func__, ses->ses_count);
-       cifs_dbg(FYI,
-                "%s: ses ipc: %s\n", __func__, ses->tcon_ipc ? ses->tcon_ipc->tree_name : "NONE");
+       spin_lock(&ses->chan_lock);
+       cifs_chan_clear_need_reconnect(ses, server);
+       spin_unlock(&ses->chan_lock);
 
-       spin_lock(&cifs_tcp_ses_lock);
-       if (--ses->ses_count > 0) {
-               spin_unlock(&cifs_tcp_ses_lock);
-               return;
-       }
-       spin_lock(&ses->ses_lock);
-       if (ses->ses_status == SES_GOOD)
-               ses->ses_status = SES_EXITING;
+       do_logoff = ses->ses_status == SES_GOOD && server->ops->logoff;
+       ses->ses_status = SES_EXITING;
+       tcon = ses->tcon_ipc;
+       ses->tcon_ipc = NULL;
        spin_unlock(&ses->ses_lock);
        spin_unlock(&cifs_tcp_ses_lock);
 
-       /* ses_count can never go negative */
-       WARN_ON(ses->ses_count < 0);
-
-       spin_lock(&ses->ses_lock);
-       if (ses->ses_status == SES_EXITING && server->ops->logoff) {
-               spin_unlock(&ses->ses_lock);
-               cifs_free_ipc(ses);
+       /*
+        * On session close, the IPC is closed and the server must release all
+        * tcons of the session.  No need to send a tree disconnect here.
+        *
+        * Besides, it will make the server to not close durable and resilient
+        * files on session close, as specified in MS-SMB2 3.3.5.6 Receiving an
+        * SMB2 LOGOFF Request.
+        */
+       tconInfoFree(tcon);
+       if (do_logoff) {
                xid = get_xid();
                rc = server->ops->logoff(xid, ses);
                if (rc)
                        cifs_server_dbg(VFS, "%s: Session Logoff failure rc=%d\n",
                                __func__, rc);
                _free_xid(xid);
-       } else {
-               spin_unlock(&ses->ses_lock);
-               cifs_free_ipc(ses);
        }
 
        spin_lock(&cifs_tcp_ses_lock);
@@ -2373,9 +2363,9 @@ cifs_get_smb_ses(struct TCP_Server_Info *server, struct smb3_fs_context *ctx)
         * need to lock before changing something in the session.
         */
        spin_lock(&cifs_tcp_ses_lock);
+       if (ctx->dfs_root_ses)
+               cifs_smb_ses_inc_refcount(ctx->dfs_root_ses);
        ses->dfs_root_ses = ctx->dfs_root_ses;
-       if (ses->dfs_root_ses)
-               ses->dfs_root_ses->ses_count++;
        list_add(&ses->smb_ses_list, &server->smb_ses_list);
        spin_unlock(&cifs_tcp_ses_lock);
 
@@ -3326,6 +3316,9 @@ void cifs_mount_put_conns(struct cifs_mount_ctx *mnt_ctx)
                cifs_put_smb_ses(mnt_ctx->ses);
        else if (mnt_ctx->server)
                cifs_put_tcp_session(mnt_ctx->server, 0);
+       mnt_ctx->ses = NULL;
+       mnt_ctx->tcon = NULL;
+       mnt_ctx->server = NULL;
        mnt_ctx->cifs_sb->mnt_cifs_flags &= ~CIFS_MOUNT_POSIX_PATHS;
        free_xid(mnt_ctx->xid);
 }
@@ -3604,8 +3597,6 @@ int cifs_mount(struct cifs_sb_info *cifs_sb, struct smb3_fs_context *ctx)
        bool isdfs;
        int rc;
 
-       INIT_LIST_HEAD(&mnt_ctx.dfs_ses_list);
-
        rc = dfs_mount_share(&mnt_ctx, &isdfs);
        if (rc)
                goto error;
@@ -3636,7 +3627,6 @@ out:
        return rc;
 
 error:
-       dfs_put_root_smb_sessions(&mnt_ctx.dfs_ses_list);
        cifs_mount_put_conns(&mnt_ctx);
        return rc;
 }
@@ -3651,6 +3641,18 @@ int cifs_mount(struct cifs_sb_info *cifs_sb, struct smb3_fs_context *ctx)
                goto error;
 
        rc = cifs_mount_get_tcon(&mnt_ctx);
+       if (!rc) {
+               /*
+                * Prevent superblock from being created with any missing
+                * connections.
+                */
+               if (WARN_ON(!mnt_ctx.server))
+                       rc = -EHOSTDOWN;
+               else if (WARN_ON(!mnt_ctx.ses))
+                       rc = -EACCES;
+               else if (WARN_ON(!mnt_ctx.tcon))
+                       rc = -ENOENT;
+       }
        if (rc)
                goto error;
 
@@ -3988,13 +3990,14 @@ cifs_set_vol_auth(struct smb3_fs_context *ctx, struct cifs_ses *ses)
 }
 
 static struct cifs_tcon *
-cifs_construct_tcon(struct cifs_sb_info *cifs_sb, kuid_t fsuid)
+__cifs_construct_tcon(struct cifs_sb_info *cifs_sb, kuid_t fsuid)
 {
        int rc;
        struct cifs_tcon *master_tcon = cifs_sb_master_tcon(cifs_sb);
        struct cifs_ses *ses;
        struct cifs_tcon *tcon = NULL;
        struct smb3_fs_context *ctx;
+       char *origin_fullpath = NULL;
 
        ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
        if (ctx == NULL)
@@ -4018,6 +4021,7 @@ cifs_construct_tcon(struct cifs_sb_info *cifs_sb, kuid_t fsuid)
        ctx->sign = master_tcon->ses->sign;
        ctx->seal = master_tcon->seal;
        ctx->witness = master_tcon->use_witness;
+       ctx->dfs_root_ses = master_tcon->ses->dfs_root_ses;
 
        rc = cifs_set_vol_auth(ctx, master_tcon->ses);
        if (rc) {
@@ -4037,12 +4041,39 @@ cifs_construct_tcon(struct cifs_sb_info *cifs_sb, kuid_t fsuid)
                goto out;
        }
 
+#ifdef CONFIG_CIFS_DFS_UPCALL
+       spin_lock(&master_tcon->tc_lock);
+       if (master_tcon->origin_fullpath) {
+               spin_unlock(&master_tcon->tc_lock);
+               origin_fullpath = dfs_get_path(cifs_sb, cifs_sb->ctx->source);
+               if (IS_ERR(origin_fullpath)) {
+                       tcon = ERR_CAST(origin_fullpath);
+                       origin_fullpath = NULL;
+                       cifs_put_smb_ses(ses);
+                       goto out;
+               }
+       } else {
+               spin_unlock(&master_tcon->tc_lock);
+       }
+#endif
+
        tcon = cifs_get_tcon(ses, ctx);
        if (IS_ERR(tcon)) {
                cifs_put_smb_ses(ses);
                goto out;
        }
 
+#ifdef CONFIG_CIFS_DFS_UPCALL
+       if (origin_fullpath) {
+               spin_lock(&tcon->tc_lock);
+               tcon->origin_fullpath = origin_fullpath;
+               spin_unlock(&tcon->tc_lock);
+               origin_fullpath = NULL;
+               queue_delayed_work(dfscache_wq, &tcon->dfs_cache_work,
+                                  dfs_cache_get_ttl() * HZ);
+       }
+#endif
+
 #ifdef CONFIG_CIFS_ALLOW_INSECURE_LEGACY
        if (cap_unix(ses))
                reset_cifs_unix_caps(0, tcon, NULL, ctx);
@@ -4051,11 +4082,23 @@ cifs_construct_tcon(struct cifs_sb_info *cifs_sb, kuid_t fsuid)
 out:
        kfree(ctx->username);
        kfree_sensitive(ctx->password);
+       kfree(origin_fullpath);
        kfree(ctx);
 
        return tcon;
 }
 
+static struct cifs_tcon *
+cifs_construct_tcon(struct cifs_sb_info *cifs_sb, kuid_t fsuid)
+{
+       struct cifs_tcon *ret;
+
+       cifs_mount_lock();
+       ret = __cifs_construct_tcon(cifs_sb, fsuid);
+       cifs_mount_unlock();
+       return ret;
+}
+
 struct cifs_tcon *
 cifs_sb_master_tcon(struct cifs_sb_info *cifs_sb)
 {
index 449c59830039bc04897e5031dba2dbc9c6649bad..3ec965547e3d4d5979da80f41681b291c29c9256 100644 (file)
@@ -66,33 +66,20 @@ static int get_session(struct cifs_mount_ctx *mnt_ctx, const char *full_path)
 }
 
 /*
- * Track individual DFS referral servers used by new DFS mount.
- *
- * On success, their lifetime will be shared by final tcon (dfs_ses_list).
- * Otherwise, they will be put by dfs_put_root_smb_sessions() in cifs_mount().
+ * Get an active reference of @ses so that next call to cifs_put_tcon() won't
+ * release it as any new DFS referrals must go through its IPC tcon.
  */
-static int add_root_smb_session(struct cifs_mount_ctx *mnt_ctx)
+static void add_root_smb_session(struct cifs_mount_ctx *mnt_ctx)
 {
        struct smb3_fs_context *ctx = mnt_ctx->fs_ctx;
-       struct dfs_root_ses *root_ses;
        struct cifs_ses *ses = mnt_ctx->ses;
 
        if (ses) {
-               root_ses = kmalloc(sizeof(*root_ses), GFP_KERNEL);
-               if (!root_ses)
-                       return -ENOMEM;
-
-               INIT_LIST_HEAD(&root_ses->list);
-
                spin_lock(&cifs_tcp_ses_lock);
                cifs_smb_ses_inc_refcount(ses);
                spin_unlock(&cifs_tcp_ses_lock);
-               root_ses->ses = ses;
-               list_add_tail(&root_ses->list, &mnt_ctx->dfs_ses_list);
        }
-       /* Select new DFS referral server so that new referrals go through it */
        ctx->dfs_root_ses = ses;
-       return 0;
 }
 
 static inline int parse_dfs_target(struct smb3_fs_context *ctx,
@@ -185,11 +172,8 @@ again:
                                        continue;
                        }
 
-                       if (is_refsrv) {
-                               rc = add_root_smb_session(mnt_ctx);
-                               if (rc)
-                                       goto out;
-                       }
+                       if (is_refsrv)
+                               add_root_smb_session(mnt_ctx);
 
                        rc = ref_walk_advance(rw);
                        if (!rc) {
@@ -232,6 +216,7 @@ static int __dfs_mount_share(struct cifs_mount_ctx *mnt_ctx)
        struct smb3_fs_context *ctx = mnt_ctx->fs_ctx;
        struct cifs_tcon *tcon;
        char *origin_fullpath;
+       bool new_tcon = true;
        int rc;
 
        origin_fullpath = dfs_get_path(cifs_sb, ctx->source);
@@ -239,6 +224,18 @@ static int __dfs_mount_share(struct cifs_mount_ctx *mnt_ctx)
                return PTR_ERR(origin_fullpath);
 
        rc = dfs_referral_walk(mnt_ctx);
+       if (!rc) {
+               /*
+                * Prevent superblock from being created with any missing
+                * connections.
+                */
+               if (WARN_ON(!mnt_ctx->server))
+                       rc = -EHOSTDOWN;
+               else if (WARN_ON(!mnt_ctx->ses))
+                       rc = -EACCES;
+               else if (WARN_ON(!mnt_ctx->tcon))
+                       rc = -ENOENT;
+       }
        if (rc)
                goto out;
 
@@ -247,15 +244,14 @@ static int __dfs_mount_share(struct cifs_mount_ctx *mnt_ctx)
        if (!tcon->origin_fullpath) {
                tcon->origin_fullpath = origin_fullpath;
                origin_fullpath = NULL;
+       } else {
+               new_tcon = false;
        }
        spin_unlock(&tcon->tc_lock);
 
-       if (list_empty(&tcon->dfs_ses_list)) {
-               list_replace_init(&mnt_ctx->dfs_ses_list, &tcon->dfs_ses_list);
+       if (new_tcon) {
                queue_delayed_work(dfscache_wq, &tcon->dfs_cache_work,
                                   dfs_cache_get_ttl() * HZ);
-       } else {
-               dfs_put_root_smb_sessions(&mnt_ctx->dfs_ses_list);
        }
 
 out:
@@ -298,7 +294,6 @@ int dfs_mount_share(struct cifs_mount_ctx *mnt_ctx, bool *isdfs)
        if (rc)
                return rc;
 
-       ctx->dfs_root_ses = mnt_ctx->ses;
        /*
         * If called with 'nodfs' mount option, then skip DFS resolving.  Otherwise unconditionally
         * try to get an DFS referral (even cached) to determine whether it is an DFS mount.
@@ -324,7 +319,9 @@ int dfs_mount_share(struct cifs_mount_ctx *mnt_ctx, bool *isdfs)
 
        *isdfs = true;
        add_root_smb_session(mnt_ctx);
-       return __dfs_mount_share(mnt_ctx);
+       rc = __dfs_mount_share(mnt_ctx);
+       dfs_put_root_smb_sessions(mnt_ctx);
+       return rc;
 }
 
 /* Update dfs referral path of superblock */
index 875ab7ae57fcdf4493237084d41c6e3617128623..e5c4dcf837503aa2851f9b01680b6f5b7eb8874d 100644 (file)
@@ -7,7 +7,9 @@
 #define _CIFS_DFS_H
 
 #include "cifsglob.h"
+#include "cifsproto.h"
 #include "fs_context.h"
+#include "dfs_cache.h"
 #include "cifs_unicode.h"
 #include <linux/namei.h>
 
@@ -114,11 +116,6 @@ static inline void ref_walk_set_tgt_hint(struct dfs_ref_walk *rw)
                                       ref_walk_tit(rw));
 }
 
-struct dfs_root_ses {
-       struct list_head list;
-       struct cifs_ses *ses;
-};
-
 int dfs_parse_target_referral(const char *full_path, const struct dfs_info3_param *ref,
                              struct smb3_fs_context *ctx);
 int dfs_mount_share(struct cifs_mount_ctx *mnt_ctx, bool *isdfs);
@@ -133,20 +130,32 @@ static inline int dfs_get_referral(struct cifs_mount_ctx *mnt_ctx, const char *p
 {
        struct smb3_fs_context *ctx = mnt_ctx->fs_ctx;
        struct cifs_sb_info *cifs_sb = mnt_ctx->cifs_sb;
+       struct cifs_ses *rses = ctx->dfs_root_ses ?: mnt_ctx->ses;
 
-       return dfs_cache_find(mnt_ctx->xid, ctx->dfs_root_ses, cifs_sb->local_nls,
+       return dfs_cache_find(mnt_ctx->xid, rses, cifs_sb->local_nls,
                              cifs_remap(cifs_sb), path, ref, tl);
 }
 
-static inline void dfs_put_root_smb_sessions(struct list_head *head)
+/*
+ * cifs_get_smb_ses() already guarantees an active reference of
+ * @ses->dfs_root_ses when a new session is created, so we need to put extra
+ * references of all DFS root sessions that were used across the mount process
+ * in dfs_mount_share().
+ */
+static inline void dfs_put_root_smb_sessions(struct cifs_mount_ctx *mnt_ctx)
 {
-       struct dfs_root_ses *root, *tmp;
+       const struct smb3_fs_context *ctx = mnt_ctx->fs_ctx;
+       struct cifs_ses *ses = ctx->dfs_root_ses;
+       struct cifs_ses *cur;
+
+       if (!ses)
+               return;
 
-       list_for_each_entry_safe(root, tmp, head, list) {
-               list_del_init(&root->list);
-               cifs_put_smb_ses(root->ses);
-               kfree(root);
+       for (cur = ses; cur; cur = cur->dfs_root_ses) {
+               if (cur->dfs_root_ses)
+                       cifs_put_smb_ses(cur->dfs_root_ses);
        }
+       cifs_put_smb_ses(ses);
 }
 
 #endif /* _CIFS_DFS_H */
index 508d831fabe37899fb016b05431307e0c3cd8a80..11c8efecf7aa128d30527ac1e44a3124f03ec5fb 100644 (file)
@@ -1172,8 +1172,8 @@ static bool is_ses_good(struct cifs_ses *ses)
        return ret;
 }
 
-/* Refresh dfs referral of tcon and mark it for reconnect if needed */
-static int __refresh_tcon(const char *path, struct cifs_ses *ses, bool force_refresh)
+/* Refresh dfs referral of @ses and mark it for reconnect if needed */
+static void __refresh_ses_referral(struct cifs_ses *ses, bool force_refresh)
 {
        struct TCP_Server_Info *server = ses->server;
        DFS_CACHE_TGT_LIST(old_tl);
@@ -1181,10 +1181,21 @@ static int __refresh_tcon(const char *path, struct cifs_ses *ses, bool force_ref
        bool needs_refresh = false;
        struct cache_entry *ce;
        unsigned int xid;
+       char *path = NULL;
        int rc = 0;
 
        xid = get_xid();
 
+       mutex_lock(&server->refpath_lock);
+       if (server->leaf_fullpath) {
+               path = kstrdup(server->leaf_fullpath + 1, GFP_ATOMIC);
+               if (!path)
+                       rc = -ENOMEM;
+       }
+       mutex_unlock(&server->refpath_lock);
+       if (!path)
+               goto out;
+
        down_read(&htable_rw_lock);
        ce = lookup_cache_entry(path);
        needs_refresh = force_refresh || IS_ERR(ce) || cache_entry_expired(ce);
@@ -1218,19 +1229,17 @@ out:
        free_xid(xid);
        dfs_cache_free_tgts(&old_tl);
        dfs_cache_free_tgts(&new_tl);
-       return rc;
+       kfree(path);
 }
 
-static int refresh_tcon(struct cifs_tcon *tcon, bool force_refresh)
+static inline void refresh_ses_referral(struct cifs_ses *ses)
 {
-       struct TCP_Server_Info *server = tcon->ses->server;
-       struct cifs_ses *ses = tcon->ses;
+       __refresh_ses_referral(ses, false);
+}
 
-       mutex_lock(&server->refpath_lock);
-       if (server->leaf_fullpath)
-               __refresh_tcon(server->leaf_fullpath + 1, ses, force_refresh);
-       mutex_unlock(&server->refpath_lock);
-       return 0;
+static inline void force_refresh_ses_referral(struct cifs_ses *ses)
+{
+       __refresh_ses_referral(ses, true);
 }
 
 /**
@@ -1271,34 +1280,20 @@ int dfs_cache_remount_fs(struct cifs_sb_info *cifs_sb)
         */
        cifs_sb->mnt_cifs_flags |= CIFS_MOUNT_USE_PREFIX_PATH;
 
-       return refresh_tcon(tcon, true);
+       force_refresh_ses_referral(tcon->ses);
+       return 0;
 }
 
 /* Refresh all DFS referrals related to DFS tcon */
 void dfs_cache_refresh(struct work_struct *work)
 {
-       struct TCP_Server_Info *server;
-       struct dfs_root_ses *rses;
        struct cifs_tcon *tcon;
        struct cifs_ses *ses;
 
        tcon = container_of(work, struct cifs_tcon, dfs_cache_work.work);
-       ses = tcon->ses;
-       server = ses->server;
 
-       mutex_lock(&server->refpath_lock);
-       if (server->leaf_fullpath)
-               __refresh_tcon(server->leaf_fullpath + 1, ses, false);
-       mutex_unlock(&server->refpath_lock);
-
-       list_for_each_entry(rses, &tcon->dfs_ses_list, list) {
-               ses = rses->ses;
-               server = ses->server;
-               mutex_lock(&server->refpath_lock);
-               if (server->leaf_fullpath)
-                       __refresh_tcon(server->leaf_fullpath + 1, ses, false);
-               mutex_unlock(&server->refpath_lock);
-       }
+       for (ses = tcon->ses; ses; ses = ses->dfs_root_ses)
+               refresh_ses_referral(ses);
 
        queue_delayed_work(dfscache_wq, &tcon->dfs_cache_work,
                           atomic_read(&dfs_cache_ttl) * HZ);
index d11dc3aa458ba2d545c5a80786230b54cb62166b..864b194dbaa0a0bffc3cecac0a3c9fd4ccc9b7a4 100644 (file)
@@ -189,6 +189,7 @@ static int cifs_do_create(struct inode *inode, struct dentry *direntry, unsigned
        int disposition;
        struct TCP_Server_Info *server = tcon->ses->server;
        struct cifs_open_parms oparms;
+       int rdwr_for_fscache = 0;
 
        *oplock = 0;
        if (tcon->ses->server->oplocks)
@@ -200,6 +201,10 @@ static int cifs_do_create(struct inode *inode, struct dentry *direntry, unsigned
                return PTR_ERR(full_path);
        }
 
+       /* If we're caching, we need to be able to fill in around partial writes. */
+       if (cifs_fscache_enabled(inode) && (oflags & O_ACCMODE) == O_WRONLY)
+               rdwr_for_fscache = 1;
+
 #ifdef CONFIG_CIFS_ALLOW_INSECURE_LEGACY
        if (tcon->unix_ext && cap_unix(tcon->ses) && !tcon->broken_posix_open &&
            (CIFS_UNIX_POSIX_PATH_OPS_CAP &
@@ -276,6 +281,8 @@ static int cifs_do_create(struct inode *inode, struct dentry *direntry, unsigned
                desired_access |= GENERIC_READ; /* is this too little? */
        if (OPEN_FMODE(oflags) & FMODE_WRITE)
                desired_access |= GENERIC_WRITE;
+       if (rdwr_for_fscache == 1)
+               desired_access |= GENERIC_READ;
 
        disposition = FILE_OVERWRITE_IF;
        if ((oflags & (O_CREAT | O_EXCL)) == (O_CREAT | O_EXCL))
@@ -304,6 +311,7 @@ static int cifs_do_create(struct inode *inode, struct dentry *direntry, unsigned
        if (!tcon->unix_ext && (mode & S_IWUGO) == 0)
                create_options |= CREATE_OPTION_READONLY;
 
+retry_open:
        oparms = (struct cifs_open_parms) {
                .tcon = tcon,
                .cifs_sb = cifs_sb,
@@ -317,8 +325,15 @@ static int cifs_do_create(struct inode *inode, struct dentry *direntry, unsigned
        rc = server->ops->open(xid, &oparms, oplock, buf);
        if (rc) {
                cifs_dbg(FYI, "cifs_create returned 0x%x\n", rc);
+               if (rc == -EACCES && rdwr_for_fscache == 1) {
+                       desired_access &= ~GENERIC_READ;
+                       rdwr_for_fscache = 2;
+                       goto retry_open;
+               }
                goto out;
        }
+       if (rdwr_for_fscache == 2)
+               cifs_invalidate_cache(inode, FSCACHE_INVAL_DIO_WRITE);
 
 #ifdef CONFIG_CIFS_ALLOW_INSECURE_LEGACY
        /*
index 16aadce492b2ec67b973c8726e7cefd4857197d5..9be37d0fe724e90af1981c109512a64f01d86a0a 100644 (file)
@@ -206,12 +206,12 @@ cifs_mark_open_files_invalid(struct cifs_tcon *tcon)
         */
 }
 
-static inline int cifs_convert_flags(unsigned int flags)
+static inline int cifs_convert_flags(unsigned int flags, int rdwr_for_fscache)
 {
        if ((flags & O_ACCMODE) == O_RDONLY)
                return GENERIC_READ;
        else if ((flags & O_ACCMODE) == O_WRONLY)
-               return GENERIC_WRITE;
+               return rdwr_for_fscache == 1 ? (GENERIC_READ | GENERIC_WRITE) : GENERIC_WRITE;
        else if ((flags & O_ACCMODE) == O_RDWR) {
                /* GENERIC_ALL is too much permission to request
                   can cause unnecessary access denied on create */
@@ -348,11 +348,16 @@ static int cifs_nt_open(const char *full_path, struct inode *inode, struct cifs_
        int create_options = CREATE_NOT_DIR;
        struct TCP_Server_Info *server = tcon->ses->server;
        struct cifs_open_parms oparms;
+       int rdwr_for_fscache = 0;
 
        if (!server->ops->open)
                return -ENOSYS;
 
-       desired_access = cifs_convert_flags(f_flags);
+       /* If we're caching, we need to be able to fill in around partial writes. */
+       if (cifs_fscache_enabled(inode) && (f_flags & O_ACCMODE) == O_WRONLY)
+               rdwr_for_fscache = 1;
+
+       desired_access = cifs_convert_flags(f_flags, rdwr_for_fscache);
 
 /*********************************************************************
  *  open flag mapping table:
@@ -389,6 +394,7 @@ static int cifs_nt_open(const char *full_path, struct inode *inode, struct cifs_
        if (f_flags & O_DIRECT)
                create_options |= CREATE_NO_BUFFER;
 
+retry_open:
        oparms = (struct cifs_open_parms) {
                .tcon = tcon,
                .cifs_sb = cifs_sb,
@@ -400,8 +406,16 @@ static int cifs_nt_open(const char *full_path, struct inode *inode, struct cifs_
        };
 
        rc = server->ops->open(xid, &oparms, oplock, buf);
-       if (rc)
+       if (rc) {
+               if (rc == -EACCES && rdwr_for_fscache == 1) {
+                       desired_access = cifs_convert_flags(f_flags, 0);
+                       rdwr_for_fscache = 2;
+                       goto retry_open;
+               }
                return rc;
+       }
+       if (rdwr_for_fscache == 2)
+               cifs_invalidate_cache(inode, FSCACHE_INVAL_DIO_WRITE);
 
        /* TODO: Add support for calling posix query info but with passing in fid */
        if (tcon->unix_ext)
@@ -445,6 +459,7 @@ cifs_down_write(struct rw_semaphore *sem)
 }
 
 static void cifsFileInfo_put_work(struct work_struct *work);
+void serverclose_work(struct work_struct *work);
 
 struct cifsFileInfo *cifs_new_fileinfo(struct cifs_fid *fid, struct file *file,
                                       struct tcon_link *tlink, __u32 oplock,
@@ -491,6 +506,7 @@ struct cifsFileInfo *cifs_new_fileinfo(struct cifs_fid *fid, struct file *file,
        cfile->tlink = cifs_get_tlink(tlink);
        INIT_WORK(&cfile->oplock_break, cifs_oplock_break);
        INIT_WORK(&cfile->put, cifsFileInfo_put_work);
+       INIT_WORK(&cfile->serverclose, serverclose_work);
        INIT_DELAYED_WORK(&cfile->deferred, smb2_deferred_work_close);
        mutex_init(&cfile->fh_mutex);
        spin_lock_init(&cfile->file_info_lock);
@@ -582,6 +598,40 @@ static void cifsFileInfo_put_work(struct work_struct *work)
        cifsFileInfo_put_final(cifs_file);
 }
 
+void serverclose_work(struct work_struct *work)
+{
+       struct cifsFileInfo *cifs_file = container_of(work,
+                       struct cifsFileInfo, serverclose);
+
+       struct cifs_tcon *tcon = tlink_tcon(cifs_file->tlink);
+
+       struct TCP_Server_Info *server = tcon->ses->server;
+       int rc = 0;
+       int retries = 0;
+       int MAX_RETRIES = 4;
+
+       do {
+               if (server->ops->close_getattr)
+                       rc = server->ops->close_getattr(0, tcon, cifs_file);
+               else if (server->ops->close)
+                       rc = server->ops->close(0, tcon, &cifs_file->fid);
+
+               if (rc == -EBUSY || rc == -EAGAIN) {
+                       retries++;
+                       msleep(250);
+               }
+       } while ((rc == -EBUSY || rc == -EAGAIN) && (retries < MAX_RETRIES)
+       );
+
+       if (retries == MAX_RETRIES)
+               pr_warn("Serverclose failed %d times, giving up\n", MAX_RETRIES);
+
+       if (cifs_file->offload)
+               queue_work(fileinfo_put_wq, &cifs_file->put);
+       else
+               cifsFileInfo_put_final(cifs_file);
+}
+
 /**
  * cifsFileInfo_put - release a reference of file priv data
  *
@@ -622,10 +672,13 @@ void _cifsFileInfo_put(struct cifsFileInfo *cifs_file,
        struct cifs_fid fid = {};
        struct cifs_pending_open open;
        bool oplock_break_cancelled;
+       bool serverclose_offloaded = false;
 
        spin_lock(&tcon->open_file_lock);
        spin_lock(&cifsi->open_file_lock);
        spin_lock(&cifs_file->file_info_lock);
+
+       cifs_file->offload = offload;
        if (--cifs_file->count > 0) {
                spin_unlock(&cifs_file->file_info_lock);
                spin_unlock(&cifsi->open_file_lock);
@@ -667,13 +720,20 @@ void _cifsFileInfo_put(struct cifsFileInfo *cifs_file,
        if (!tcon->need_reconnect && !cifs_file->invalidHandle) {
                struct TCP_Server_Info *server = tcon->ses->server;
                unsigned int xid;
+               int rc = 0;
 
                xid = get_xid();
                if (server->ops->close_getattr)
-                       server->ops->close_getattr(xid, tcon, cifs_file);
+                       rc = server->ops->close_getattr(xid, tcon, cifs_file);
                else if (server->ops->close)
-                       server->ops->close(xid, tcon, &cifs_file->fid);
+                       rc = server->ops->close(xid, tcon, &cifs_file->fid);
                _free_xid(xid);
+
+               if (rc == -EBUSY || rc == -EAGAIN) {
+                       // Server close failed, hence offloading it as an async op
+                       queue_work(serverclose_wq, &cifs_file->serverclose);
+                       serverclose_offloaded = true;
+               }
        }
 
        if (oplock_break_cancelled)
@@ -681,10 +741,15 @@ void _cifsFileInfo_put(struct cifsFileInfo *cifs_file,
 
        cifs_del_pending_open(&open);
 
-       if (offload)
-               queue_work(fileinfo_put_wq, &cifs_file->put);
-       else
-               cifsFileInfo_put_final(cifs_file);
+       // if serverclose has been offloaded to wq (on failure), it will
+       // handle offloading put as well. If serverclose not offloaded,
+       // we need to handle offloading put here.
+       if (!serverclose_offloaded) {
+               if (offload)
+                       queue_work(fileinfo_put_wq, &cifs_file->put);
+               else
+                       cifsFileInfo_put_final(cifs_file);
+       }
 }
 
 int cifs_open(struct inode *inode, struct file *file)
@@ -834,11 +899,11 @@ int cifs_open(struct inode *inode, struct file *file)
 use_cache:
        fscache_use_cookie(cifs_inode_cookie(file_inode(file)),
                           file->f_mode & FMODE_WRITE);
-       if (file->f_flags & O_DIRECT &&
-           (!((file->f_flags & O_ACCMODE) != O_RDONLY) ||
-            file->f_flags & O_APPEND))
-               cifs_invalidate_cache(file_inode(file),
-                                     FSCACHE_INVAL_DIO_WRITE);
+       if (!(file->f_flags & O_DIRECT))
+               goto out;
+       if ((file->f_flags & (O_ACCMODE | O_APPEND)) == O_RDONLY)
+               goto out;
+       cifs_invalidate_cache(file_inode(file), FSCACHE_INVAL_DIO_WRITE);
 
 out:
        free_dentry_path(page);
@@ -903,6 +968,7 @@ cifs_reopen_file(struct cifsFileInfo *cfile, bool can_flush)
        int disposition = FILE_OPEN;
        int create_options = CREATE_NOT_DIR;
        struct cifs_open_parms oparms;
+       int rdwr_for_fscache = 0;
 
        xid = get_xid();
        mutex_lock(&cfile->fh_mutex);
@@ -966,7 +1032,11 @@ cifs_reopen_file(struct cifsFileInfo *cfile, bool can_flush)
        }
 #endif /* CONFIG_CIFS_ALLOW_INSECURE_LEGACY */
 
-       desired_access = cifs_convert_flags(cfile->f_flags);
+       /* If we're caching, we need to be able to fill in around partial writes. */
+       if (cifs_fscache_enabled(inode) && (cfile->f_flags & O_ACCMODE) == O_WRONLY)
+               rdwr_for_fscache = 1;
+
+       desired_access = cifs_convert_flags(cfile->f_flags, rdwr_for_fscache);
 
        /* O_SYNC also has bit for O_DSYNC so following check picks up either */
        if (cfile->f_flags & O_SYNC)
@@ -978,6 +1048,7 @@ cifs_reopen_file(struct cifsFileInfo *cfile, bool can_flush)
        if (server->ops->get_lease_key)
                server->ops->get_lease_key(inode, &cfile->fid);
 
+retry_open:
        oparms = (struct cifs_open_parms) {
                .tcon = tcon,
                .cifs_sb = cifs_sb,
@@ -1003,6 +1074,11 @@ cifs_reopen_file(struct cifsFileInfo *cfile, bool can_flush)
                /* indicate that we need to relock the file */
                oparms.reconnect = true;
        }
+       if (rc == -EACCES && rdwr_for_fscache == 1) {
+               desired_access = cifs_convert_flags(cfile->f_flags, 0);
+               rdwr_for_fscache = 2;
+               goto retry_open;
+       }
 
        if (rc) {
                mutex_unlock(&cfile->fh_mutex);
@@ -1011,6 +1087,9 @@ cifs_reopen_file(struct cifsFileInfo *cfile, bool can_flush)
                goto reopen_error_exit;
        }
 
+       if (rdwr_for_fscache == 2)
+               cifs_invalidate_cache(inode, FSCACHE_INVAL_DIO_WRITE);
+
 #ifdef CONFIG_CIFS_ALLOW_INSECURE_LEGACY
 reopen_success:
 #endif /* CONFIG_CIFS_ALLOW_INSECURE_LEGACY */
index bdcbe6ff2739ab4539c128e7945258a8914a477a..b7bfe705b2c498b83a60131713246bb9d37abf98 100644 (file)
@@ -37,7 +37,7 @@
 #include "rfc1002pdu.h"
 #include "fs_context.h"
 
-static DEFINE_MUTEX(cifs_mount_mutex);
+DEFINE_MUTEX(cifs_mount_mutex);
 
 static const match_table_t cifs_smb_version_tokens = {
        { Smb_1, SMB1_VERSION_STRING },
@@ -783,9 +783,9 @@ static int smb3_get_tree(struct fs_context *fc)
 
        if (err)
                return err;
-       mutex_lock(&cifs_mount_mutex);
+       cifs_mount_lock();
        ret = smb3_get_tree_common(fc);
-       mutex_unlock(&cifs_mount_mutex);
+       cifs_mount_unlock();
        return ret;
 }
 
index 7863f2248c4df8f1e892c2b8589813cca1f1bdd4..8a35645e0b65b244741da59177a2bcb0acea0256 100644 (file)
@@ -304,4 +304,16 @@ extern void smb3_update_mnt_flags(struct cifs_sb_info *cifs_sb);
 #define MAX_CACHED_FIDS 16
 extern char *cifs_sanitize_prepath(char *prepath, gfp_t gfp);
 
+extern struct mutex cifs_mount_mutex;
+
+static inline void cifs_mount_lock(void)
+{
+       mutex_lock(&cifs_mount_mutex);
+}
+
+static inline void cifs_mount_unlock(void)
+{
+       mutex_unlock(&cifs_mount_mutex);
+}
+
 #endif
index a3d73720914f888cead5bec48c96c633f21370e8..1f2ea9f5cc9a8a5f900a5a5367cf093d54f2f80d 100644 (file)
@@ -109,6 +109,11 @@ static inline void cifs_readahead_to_fscache(struct inode *inode,
                __cifs_readahead_to_fscache(inode, pos, len);
 }
 
+static inline bool cifs_fscache_enabled(struct inode *inode)
+{
+       return fscache_cookie_enabled(cifs_inode_cookie(inode));
+}
+
 #else /* CONFIG_CIFS_FSCACHE */
 static inline
 void cifs_fscache_fill_coherency(struct inode *inode,
@@ -124,6 +129,7 @@ static inline void cifs_fscache_release_inode_cookie(struct inode *inode) {}
 static inline void cifs_fscache_unuse_inode_cookie(struct inode *inode, bool update) {}
 static inline struct fscache_cookie *cifs_inode_cookie(struct inode *inode) { return NULL; }
 static inline void cifs_invalidate_cache(struct inode *inode, unsigned int flags) {}
+static inline bool cifs_fscache_enabled(struct inode *inode) { return false; }
 
 static inline int cifs_fscache_query_occupancy(struct inode *inode,
                                               pgoff_t first, unsigned int nr_pages,
index c012dfdba80d457e27dc51996c55309840d3f892..855ac5a62edfaa50215cfed46e361dcb79f0c8fc 100644 (file)
@@ -247,7 +247,9 @@ static int cifs_dump_full_key(struct cifs_tcon *tcon, struct smb3_full_key_debug
                spin_lock(&cifs_tcp_ses_lock);
                list_for_each_entry(server_it, &cifs_tcp_ses_list, tcp_ses_list) {
                        list_for_each_entry(ses_it, &server_it->smb_ses_list, smb_ses_list) {
-                               if (ses_it->Suid == out.session_id) {
+                               spin_lock(&ses_it->ses_lock);
+                               if (ses_it->ses_status != SES_EXITING &&
+                                   ses_it->Suid == out.session_id) {
                                        ses = ses_it;
                                        /*
                                         * since we are using the session outside the crit
@@ -255,9 +257,11 @@ static int cifs_dump_full_key(struct cifs_tcon *tcon, struct smb3_full_key_debug
                                         * so increment its refcount
                                         */
                                        cifs_smb_ses_inc_refcount(ses);
+                                       spin_unlock(&ses_it->ses_lock);
                                        found = true;
                                        goto search_end;
                                }
+                               spin_unlock(&ses_it->ses_lock);
                        }
                }
 search_end:
index c3771fc81328ff12d7d075cadab77c97b16b7e05..33ac4f8f5050c416cd2004ee4516756edd3b11d8 100644 (file)
@@ -138,9 +138,6 @@ tcon_info_alloc(bool dir_leases_enabled)
        atomic_set(&ret_buf->num_local_opens, 0);
        atomic_set(&ret_buf->num_remote_opens, 0);
        ret_buf->stats_from_time = ktime_get_real_seconds();
-#ifdef CONFIG_CIFS_DFS_UPCALL
-       INIT_LIST_HEAD(&ret_buf->dfs_ses_list);
-#endif
 
        return ret_buf;
 }
@@ -156,9 +153,6 @@ tconInfoFree(struct cifs_tcon *tcon)
        atomic_dec(&tconInfoAllocCount);
        kfree(tcon->nativeFileSystem);
        kfree_sensitive(tcon->password);
-#ifdef CONFIG_CIFS_DFS_UPCALL
-       dfs_put_root_smb_sessions(&tcon->dfs_ses_list);
-#endif
        kfree(tcon->origin_fullpath);
        kfree(tcon);
 }
@@ -487,6 +481,8 @@ is_valid_oplock_break(char *buffer, struct TCP_Server_Info *srv)
        /* look up tcon based on tid & uid */
        spin_lock(&cifs_tcp_ses_lock);
        list_for_each_entry(ses, &pserver->smb_ses_list, smb_ses_list) {
+               if (cifs_ses_exiting(ses))
+                       continue;
                list_for_each_entry(tcon, &ses->tcon_list, tcon_list) {
                        if (tcon->tid != buf->Tid)
                                continue;
index a9eaba8083b0d6b2745ebedb1d3d4705c0f4809d..212ec6f66ec65b15f50275803bbc38b0153c9645 100644 (file)
@@ -753,11 +753,11 @@ cifs_set_fid(struct cifsFileInfo *cfile, struct cifs_fid *fid, __u32 oplock)
        cinode->can_cache_brlcks = CIFS_CACHE_WRITE(cinode);
 }
 
-static void
+static int
 cifs_close_file(const unsigned int xid, struct cifs_tcon *tcon,
                struct cifs_fid *fid)
 {
-       CIFSSMBClose(xid, tcon, fid->netfid);
+       return CIFSSMBClose(xid, tcon, fid->netfid);
 }
 
 static int
index 82b84a4941dd2f05e8d516b54b6a209dbd7985d1..cc72be5a93a933b09c45256c2a7d7615478f54c0 100644 (file)
@@ -622,6 +622,8 @@ smb2_is_valid_lease_break(char *buffer, struct TCP_Server_Info *server)
        /* look up tcon based on tid & uid */
        spin_lock(&cifs_tcp_ses_lock);
        list_for_each_entry(ses, &pserver->smb_ses_list, smb_ses_list) {
+               if (cifs_ses_exiting(ses))
+                       continue;
                list_for_each_entry(tcon, &ses->tcon_list, tcon_list) {
                        spin_lock(&tcon->open_file_lock);
                        cifs_stats_inc(
@@ -697,6 +699,8 @@ smb2_is_valid_oplock_break(char *buffer, struct TCP_Server_Info *server)
        /* look up tcon based on tid & uid */
        spin_lock(&cifs_tcp_ses_lock);
        list_for_each_entry(ses, &pserver->smb_ses_list, smb_ses_list) {
+               if (cifs_ses_exiting(ses))
+                       continue;
                list_for_each_entry(tcon, &ses->tcon_list, tcon_list) {
 
                        spin_lock(&tcon->open_file_lock);
index 2ed456948f34ca0cd88dbe626429694456a681e2..b156eefa75d7cb4b13d1bf402234f08271a558ad 100644 (file)
@@ -1412,14 +1412,14 @@ smb2_set_fid(struct cifsFileInfo *cfile, struct cifs_fid *fid, __u32 oplock)
        memcpy(cfile->fid.create_guid, fid->create_guid, 16);
 }
 
-static void
+static int
 smb2_close_file(const unsigned int xid, struct cifs_tcon *tcon,
                struct cifs_fid *fid)
 {
-       SMB2_close(xid, tcon, fid->persistent_fid, fid->volatile_fid);
+       return SMB2_close(xid, tcon, fid->persistent_fid, fid->volatile_fid);
 }
 
-static void
+static int
 smb2_close_getattr(const unsigned int xid, struct cifs_tcon *tcon,
                   struct cifsFileInfo *cfile)
 {
@@ -1430,7 +1430,7 @@ smb2_close_getattr(const unsigned int xid, struct cifs_tcon *tcon,
        rc = __SMB2_close(xid, tcon, cfile->fid.persistent_fid,
                   cfile->fid.volatile_fid, &file_inf);
        if (rc)
-               return;
+               return rc;
 
        inode = d_inode(cfile->dentry);
 
@@ -1459,6 +1459,7 @@ smb2_close_getattr(const unsigned int xid, struct cifs_tcon *tcon,
 
        /* End of file and Attributes should not have to be updated on close */
        spin_unlock(&inode->i_lock);
+       return rc;
 }
 
 static int
@@ -2480,6 +2481,8 @@ smb2_is_network_name_deleted(char *buf, struct TCP_Server_Info *server)
 
        spin_lock(&cifs_tcp_ses_lock);
        list_for_each_entry(ses, &pserver->smb_ses_list, smb_ses_list) {
+               if (cifs_ses_exiting(ses))
+                       continue;
                list_for_each_entry(tcon, &ses->tcon_list, tcon_list) {
                        if (tcon->tid == le32_to_cpu(shdr->Id.SyncId.TreeId)) {
                                spin_lock(&tcon->tc_lock);
@@ -3913,7 +3916,7 @@ smb21_set_oplock_level(struct cifsInodeInfo *cinode, __u32 oplock,
                strcat(message, "W");
        }
        if (!new_oplock)
-               strncpy(message, "None", sizeof(message));
+               strscpy(message, "None");
 
        cinode->oplock = new_oplock;
        cifs_dbg(FYI, "%s Lease granted on inode %p\n", message,
index 3ea688558e6c9b96f454e29197023ec74f274e2a..c0c4933af5fc386911922b4e23c7869bdea8098b 100644 (file)
@@ -3628,9 +3628,9 @@ replay_again:
                        memcpy(&pbuf->network_open_info,
                               &rsp->network_open_info,
                               sizeof(pbuf->network_open_info));
+               atomic_dec(&tcon->num_remote_opens);
        }
 
-       atomic_dec(&tcon->num_remote_opens);
 close_exit:
        SMB2_close_free(&rqst);
        free_rsp_buf(resp_buftype, rsp);
index 5a3ca62d2f07f72584392975221cbc9b12276fe8..1d6e54f7879e6a5e8034a90d30471fecc02d2d1b 100644 (file)
@@ -659,7 +659,7 @@ smb2_sign_rqst(struct smb_rqst *rqst, struct TCP_Server_Info *server)
        }
        spin_unlock(&server->srv_lock);
        if (!is_binding && !server->session_estab) {
-               strncpy(shdr->Signature, "BSRSPYL", 8);
+               strscpy(shdr->Signature, "BSRSPYL");
                return 0;
        }
 
index 8ca8a45c4c621c5dabf56664bec27e5df0db8356..686b321c5a8bb5f0a1189023a3311e85aaf84c9e 100644 (file)
@@ -167,7 +167,8 @@ struct ksmbd_share_config_response {
        __u16   force_uid;
        __u16   force_gid;
        __s8    share_name[KSMBD_REQ_MAX_SHARE_NAME];
-       __u32   reserved[112];          /* Reserved room */
+       __u32   reserved[111];          /* Reserved room */
+       __u32   payload_sz;
        __u32   veto_list_sz;
        __s8    ____payload[];
 };
index 328a412259dc1b935eb2ce3eee2977d39155157b..a2f0a2edceb8ae49852dde1b40cf7594df6084d9 100644 (file)
@@ -158,7 +158,12 @@ static struct ksmbd_share_config *share_config_request(struct unicode_map *um,
        share->name = kstrdup(name, GFP_KERNEL);
 
        if (!test_share_config_flag(share, KSMBD_SHARE_FLAG_PIPE)) {
-               share->path = kstrdup(ksmbd_share_config_path(resp),
+               int path_len = PATH_MAX;
+
+               if (resp->payload_sz)
+                       path_len = resp->payload_sz - resp->veto_list_sz;
+
+               share->path = kstrndup(ksmbd_share_config_path(resp), path_len,
                                      GFP_KERNEL);
                if (share->path)
                        share->path_sz = strlen(share->path);
index a45f7dca482e01897720cd69d90291cbaf3ff388..606aa3c5189a28de2e49e602e2c08362c91f46b7 100644 (file)
@@ -228,6 +228,11 @@ void init_smb3_0_server(struct ksmbd_conn *conn)
            conn->cli_cap & SMB2_GLOBAL_CAP_ENCRYPTION)
                conn->vals->capabilities |= SMB2_GLOBAL_CAP_ENCRYPTION;
 
+       if (server_conf.flags & KSMBD_GLOBAL_FLAG_SMB2_ENCRYPTION ||
+           (!(server_conf.flags & KSMBD_GLOBAL_FLAG_SMB2_ENCRYPTION_OFF) &&
+            conn->cli_cap & SMB2_GLOBAL_CAP_ENCRYPTION))
+               conn->vals->capabilities |= SMB2_GLOBAL_CAP_ENCRYPTION;
+
        if (server_conf.flags & KSMBD_GLOBAL_FLAG_SMB3_MULTICHANNEL)
                conn->vals->capabilities |= SMB2_GLOBAL_CAP_MULTI_CHANNEL;
 }
@@ -278,11 +283,6 @@ int init_smb3_11_server(struct ksmbd_conn *conn)
                conn->vals->capabilities |= SMB2_GLOBAL_CAP_LEASING |
                        SMB2_GLOBAL_CAP_DIRECTORY_LEASING;
 
-       if (server_conf.flags & KSMBD_GLOBAL_FLAG_SMB2_ENCRYPTION ||
-           (!(server_conf.flags & KSMBD_GLOBAL_FLAG_SMB2_ENCRYPTION_OFF) &&
-            conn->cli_cap & SMB2_GLOBAL_CAP_ENCRYPTION))
-               conn->vals->capabilities |= SMB2_GLOBAL_CAP_ENCRYPTION;
-
        if (server_conf.flags & KSMBD_GLOBAL_FLAG_SMB3_MULTICHANNEL)
                conn->vals->capabilities |= SMB2_GLOBAL_CAP_MULTI_CHANNEL;
 
index d478fa0c57abdbc7b8478624edf5133e202c85bf..5723bbf372d7cc93c9e1b2dbdd5082c2824f85f8 100644 (file)
@@ -5857,8 +5857,9 @@ static int smb2_rename(struct ksmbd_work *work,
        if (!file_info->ReplaceIfExists)
                flags = RENAME_NOREPLACE;
 
-       smb_break_all_levII_oplock(work, fp, 0);
        rc = ksmbd_vfs_rename(work, &fp->filp->f_path, new_name, flags);
+       if (!rc)
+               smb_break_all_levII_oplock(work, fp, 0);
 out:
        kfree(new_name);
        return rc;
index f29bb03f0dc47bfcb0fe3fc5c5acff16d5a314a8..8752ac82c557bf92985bd4d87a3e37f4cd4a60dc 100644 (file)
@@ -65,6 +65,7 @@ struct ipc_msg_table_entry {
        struct hlist_node       ipc_table_hlist;
 
        void                    *response;
+       unsigned int            msg_sz;
 };
 
 static struct delayed_work ipc_timer_work;
@@ -275,6 +276,7 @@ static int handle_response(int type, void *payload, size_t sz)
                }
 
                memcpy(entry->response, payload, sz);
+               entry->msg_sz = sz;
                wake_up_interruptible(&entry->wait);
                ret = 0;
                break;
@@ -453,6 +455,34 @@ out:
        return ret;
 }
 
+static int ipc_validate_msg(struct ipc_msg_table_entry *entry)
+{
+       unsigned int msg_sz = entry->msg_sz;
+
+       if (entry->type == KSMBD_EVENT_RPC_REQUEST) {
+               struct ksmbd_rpc_command *resp = entry->response;
+
+               msg_sz = sizeof(struct ksmbd_rpc_command) + resp->payload_sz;
+       } else if (entry->type == KSMBD_EVENT_SPNEGO_AUTHEN_REQUEST) {
+               struct ksmbd_spnego_authen_response *resp = entry->response;
+
+               msg_sz = sizeof(struct ksmbd_spnego_authen_response) +
+                               resp->session_key_len + resp->spnego_blob_len;
+       } else if (entry->type == KSMBD_EVENT_SHARE_CONFIG_REQUEST) {
+               struct ksmbd_share_config_response *resp = entry->response;
+
+               if (resp->payload_sz) {
+                       if (resp->payload_sz < resp->veto_list_sz)
+                               return -EINVAL;
+
+                       msg_sz = sizeof(struct ksmbd_share_config_response) +
+                                       resp->payload_sz;
+               }
+       }
+
+       return entry->msg_sz != msg_sz ? -EINVAL : 0;
+}
+
 static void *ipc_msg_send_request(struct ksmbd_ipc_msg *msg, unsigned int handle)
 {
        struct ipc_msg_table_entry entry;
@@ -477,6 +507,13 @@ static void *ipc_msg_send_request(struct ksmbd_ipc_msg *msg, unsigned int handle
        ret = wait_event_interruptible_timeout(entry.wait,
                                               entry.response != NULL,
                                               IPC_WAIT_TIMEOUT);
+       if (entry.response) {
+               ret = ipc_validate_msg(&entry);
+               if (ret) {
+                       kvfree(entry.response);
+                       entry.response = NULL;
+               }
+       }
 out:
        down_write(&ipc_msg_table_lock);
        hash_del(&entry.ipc_table_hlist);
index 71d9779c42b10aca8bd4e0b7b667fc62386e2305..69ce6c600968479bd6832a6705352eb2d88427c1 100644 (file)
@@ -1515,29 +1515,11 @@ static int fs_bdev_thaw(struct block_device *bdev)
        return error;
 }
 
-static void fs_bdev_super_get(void *data)
-{
-       struct super_block *sb = data;
-
-       spin_lock(&sb_lock);
-       sb->s_count++;
-       spin_unlock(&sb_lock);
-}
-
-static void fs_bdev_super_put(void *data)
-{
-       struct super_block *sb = data;
-
-       put_super(sb);
-}
-
 const struct blk_holder_ops fs_holder_ops = {
        .mark_dead              = fs_bdev_mark_dead,
        .sync                   = fs_bdev_sync,
        .freeze                 = fs_bdev_freeze,
        .thaw                   = fs_bdev_thaw,
-       .get_holder             = fs_bdev_super_get,
-       .put_holder             = fs_bdev_super_put,
 };
 EXPORT_SYMBOL_GPL(fs_holder_ops);
 
@@ -1562,7 +1544,7 @@ int setup_bdev_super(struct super_block *sb, int sb_flags,
         * writable from userspace even for a read-only block device.
         */
        if ((mode & BLK_OPEN_WRITE) && bdev_read_only(bdev)) {
-               fput(bdev_file);
+               bdev_fput(bdev_file);
                return -EACCES;
        }
 
@@ -1573,7 +1555,7 @@ int setup_bdev_super(struct super_block *sb, int sb_flags,
        if (atomic_read(&bdev->bd_fsfreeze_count) > 0) {
                if (fc)
                        warnf(fc, "%pg: Can't mount, blockdev is frozen", bdev);
-               fput(bdev_file);
+               bdev_fput(bdev_file);
                return -EBUSY;
        }
        spin_lock(&sb_lock);
@@ -1693,7 +1675,7 @@ void kill_block_super(struct super_block *sb)
        generic_shutdown_super(sb);
        if (bdev) {
                sync_blockdev(bdev);
-               fput(sb->s_bdev_file);
+               bdev_fput(sb->s_bdev_file);
        }
 }
 
index 1a18c381127e2183169eaa8280aa620d66340a71..f0fa02264edaaeef2d23d1101d1953d6454e8832 100644 (file)
@@ -2030,7 +2030,7 @@ xfs_free_buftarg(
        fs_put_dax(btp->bt_daxdev, btp->bt_mount);
        /* the main block device is closed by kill_block_super */
        if (btp->bt_bdev != btp->bt_mount->m_super->s_bdev)
-               fput(btp->bt_bdev_file);
+               bdev_fput(btp->bt_bdev_file);
        kfree(btp);
 }
 
index ea48774f6b76d398bf6f9731145483751a4bf506..d55b42b2480d6c53f3367e4453cc69c5b80c6870 100644 (file)
@@ -1301,8 +1301,19 @@ xfs_link(
         */
        if (unlikely((tdp->i_diflags & XFS_DIFLAG_PROJINHERIT) &&
                     tdp->i_projid != sip->i_projid)) {
-               error = -EXDEV;
-               goto error_return;
+               /*
+                * Project quota setup skips special files which can
+                * leave inodes in a PROJINHERIT directory without a
+                * project ID set. We need to allow links to be made
+                * to these "project-less" inodes because userspace
+                * expects them to succeed after project ID setup,
+                * but everything else should be rejected.
+                */
+               if (!special_file(VFS_I(sip)->i_mode) ||
+                   sip->i_projid != 0) {
+                       error = -EXDEV;
+                       goto error_return;
+               }
        }
 
        if (!resblks) {
index c21f10ab0f5dbef4051b6bef01eb64c77247e056..bce020374c5eba5255a98d71176d7198024603d2 100644 (file)
@@ -485,7 +485,7 @@ xfs_open_devices(
                mp->m_logdev_targp = mp->m_ddev_targp;
                /* Handle won't be used, drop it */
                if (logdev_file)
-                       fput(logdev_file);
+                       bdev_fput(logdev_file);
        }
 
        return 0;
@@ -497,10 +497,10 @@ xfs_open_devices(
        xfs_free_buftarg(mp->m_ddev_targp);
  out_close_rtdev:
         if (rtdev_file)
-               fput(rtdev_file);
+               bdev_fput(rtdev_file);
  out_close_logdev:
        if (logdev_file)
-               fput(logdev_file);
+               bdev_fput(logdev_file);
        return error;
 }
 
index c3e8f7cf96be9e1c10169d2e7afe31696082eb8f..172c918799995f8ce6ed285bc5a8ea3341ddb6ed 100644 (file)
@@ -1505,16 +1505,6 @@ struct blk_holder_ops {
         * Thaw the file system mounted on the block device.
         */
        int (*thaw)(struct block_device *bdev);
-
-       /*
-        * If needed, get a reference to the holder.
-        */
-       void (*get_holder)(void *holder);
-
-       /*
-        * Release the holder.
-        */
-       void (*put_holder)(void *holder);
 };
 
 /*
@@ -1585,6 +1575,7 @@ static inline int early_lookup_bdev(const char *pathname, dev_t *dev)
 
 int bdev_freeze(struct block_device *bdev);
 int bdev_thaw(struct block_device *bdev);
+void bdev_fput(struct file *bdev_file);
 
 struct io_comp_batch {
        struct request *req_list;
index ca73940e26df83ddd65301b39b77c350b0f4c3c2..e5ee2c694401e0972d4a644f01ca523f63f83475 100644 (file)
@@ -10,6 +10,7 @@
 #ifdef __KERNEL__
 #include <linux/kernel.h>
 #include <linux/types.h>
+bool __init cmdline_has_extra_options(void);
 #else /* !__KERNEL__ */
 /*
  * NOTE: This is only for tools/bootconfig, because tools/bootconfig will
index cb0d6cd1c12f24e1dd8681b5f9f0302675bec7d5..60693a1458946223f791aae517210cb67ba13050 100644 (file)
@@ -90,6 +90,14 @@ enum cc_attr {
         * Examples include TDX Guest.
         */
        CC_ATTR_HOTPLUG_DISABLED,
+
+       /**
+        * @CC_ATTR_HOST_SEV_SNP: AMD SNP enabled on the host.
+        *
+        * The host kernel is running with the necessary features
+        * enabled to run SEV-SNP guests.
+        */
+       CC_ATTR_HOST_SEV_SNP,
 };
 
 #ifdef CONFIG_ARCH_HAS_CC_PLATFORM
@@ -107,10 +115,14 @@ enum cc_attr {
  * * FALSE - Specified Confidential Computing attribute is not active
  */
 bool cc_platform_has(enum cc_attr attr);
+void cc_platform_set(enum cc_attr attr);
+void cc_platform_clear(enum cc_attr attr);
 
 #else  /* !CONFIG_ARCH_HAS_CC_PLATFORM */
 
 static inline bool cc_platform_has(enum cc_attr attr) { return false; }
+static inline void cc_platform_set(enum cc_attr attr) { }
+static inline void cc_platform_clear(enum cc_attr attr) { }
 
 #endif /* CONFIG_ARCH_HAS_CC_PLATFORM */
 
index c00cc6c0878a1e173701a6267ac062fa3d41b790..8c252e073bd8103c8b13f39d90a965370b16d37d 100644 (file)
@@ -268,7 +268,7 @@ static inline void *offset_to_ptr(const int *off)
  *   - When one operand is a null pointer constant (i.e. when x is an integer
  *     constant expression) and the other is an object pointer (i.e. our
  *     third operand), the conditional operator returns the type of the
- *     object pointer operand (i.e. "int *). Here, within the sizeof(), we
+ *     object pointer operand (i.e. "int *"). Here, within the sizeof(), we
  *     would then get:
  *       sizeof(*((int *)(...))  == sizeof(int)  == 4
  *   - When one operand is a void pointer (i.e. when x is not an integer
index 97c4b046c09d9464243c81f294724985dc4a292a..b9f5464f44ed81134cb90032f3533f015d51389a 100644 (file)
@@ -1247,6 +1247,7 @@ void device_link_del(struct device_link *link);
 void device_link_remove(void *consumer, struct device *supplier);
 void device_links_supplier_sync_state_pause(void);
 void device_links_supplier_sync_state_resume(void);
+void device_link_wait_removal(void);
 
 /* Create alias, so I can be autoloaded. */
 #define MODULE_ALIAS_CHARDEV(major,minor) \
index 770755df852f14b31be1cefa5e37693acd6a0421..70cd7258cd29f5fc1396762f239908322ec97933 100644 (file)
@@ -245,7 +245,6 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
         * max utilization to the allowed CPU capacity before calculating
         * effective performance.
         */
-       max_util = map_util_perf(max_util);
        max_util = min(max_util, allowed_cpu_cap);
 
        /*
index 00fc429b0af0fb9bbab2382a9e347fdbac383981..8dfd53b52744a4dfffb8ccb350364972658f00eb 100644 (file)
@@ -121,6 +121,8 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
 #define FMODE_PWRITE           ((__force fmode_t)0x10)
 /* File is opened for execution with sys_execve / sys_uselib */
 #define FMODE_EXEC             ((__force fmode_t)0x20)
+/* File writes are restricted (block device specific) */
+#define FMODE_WRITE_RESTRICTED  ((__force fmode_t)0x40)
 /* 32bit hashes as llseek() offset (for directories) */
 #define FMODE_32BITHASH         ((__force fmode_t)0x200)
 /* 64bit hashes as llseek() offset (for directories) */
index 868c8fb1bbc1c2dabd708bc2c6485c2e42dee8fe..13becafe41df00f94dddb5e4f0417d3447c6456c 100644 (file)
@@ -2,6 +2,8 @@
 #ifndef __LINUX_GFP_TYPES_H
 #define __LINUX_GFP_TYPES_H
 
+#include <linux/bits.h>
+
 /* The typedef is in types.h but we want the documentation here */
 #if 0
 /**
index e248936250852dd09c7e71b958a0072c1ac86193..05df0e399d7c0b84236198f57e0c61e90412beaa 100644 (file)
@@ -294,7 +294,6 @@ struct io_ring_ctx {
 
                struct io_submit_state  submit_state;
 
-               struct io_buffer_list   *io_bl;
                struct xarray           io_bl_xa;
 
                struct io_hash_table    cancel_table_locked;
index 0436b919f1c7fc535b30400bf95affc7e4534186..7b0ee64225de9cefebaf63e0d759c082684e3ab5 100644 (file)
@@ -2207,11 +2207,6 @@ static inline int arch_make_folio_accessible(struct folio *folio)
  */
 #include <linux/vmstat.h>
 
-static __always_inline void *lowmem_page_address(const struct page *page)
-{
-       return page_to_virt(page);
-}
-
 #if defined(CONFIG_HIGHMEM) && !defined(WANT_PAGE_VIRTUAL)
 #define HASHED_PAGE_VIRTUAL
 #endif
@@ -2234,6 +2229,11 @@ void set_page_address(struct page *page, void *virtual);
 void page_address_init(void);
 #endif
 
+static __always_inline void *lowmem_page_address(const struct page *page)
+{
+       return page_to_virt(page);
+}
+
 #if !defined(HASHED_PAGE_VIRTUAL) && !defined(WANT_PAGE_VIRTUAL)
 #define page_address(page) lowmem_page_address(page)
 #define set_page_address(page, address)  do { } while(0)
index 5d868505a94e43fe4f6124915e90dc814c269278..6d92b68efbf6c3afe86b9f6c22a8759ce51e7a28 100644 (file)
@@ -80,7 +80,7 @@ DECLARE_PER_CPU(u32, kstack_offset);
        if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, \
                                &randomize_kstack_offset)) {            \
                u32 offset = raw_cpu_read(kstack_offset);               \
-               offset ^= (rand);                                       \
+               offset = ror32(offset, 5) ^ (rand);                     \
                raw_cpu_write(kstack_offset, offset);                   \
        }                                                               \
 } while (0)
index 35f3a4a8ceb1e3276f3be9ebb68a1dd275f50de0..acf7e1a3f3def9fd4027aa659ba1d4dec8e944b0 100644 (file)
@@ -13,10 +13,10 @@ static inline bool folio_is_secretmem(struct folio *folio)
        /*
         * Using folio_mapping() is quite slow because of the actual call
         * instruction.
-        * We know that secretmem pages are not compound and LRU so we can
+        * We know that secretmem pages are not compound, so we can
         * save a couple of cycles here.
         */
-       if (folio_test_large(folio) || !folio_test_lru(folio))
+       if (folio_test_large(folio))
                return false;
 
        mapping = (struct address_space *)
index 3c6caa5abc7c4262b8d62f9ef46d3fa273e7fb1e..e9ec32fb97d4a729e06aaabc7cd5627bb384d72f 100644 (file)
@@ -44,10 +44,9 @@ typedef u32 depot_stack_handle_t;
 union handle_parts {
        depot_stack_handle_t handle;
        struct {
-               /* pool_index is offset by 1 */
-               u32 pool_index  : DEPOT_POOL_INDEX_BITS;
-               u32 offset      : DEPOT_OFFSET_BITS;
-               u32 extra       : STACK_DEPOT_EXTRA_BITS;
+               u32 pool_index_plus_1   : DEPOT_POOL_INDEX_BITS;
+               u32 offset              : DEPOT_OFFSET_BITS;
+               u32 extra               : STACK_DEPOT_EXTRA_BITS;
        };
 };
 
index c6540ceea14303151317627d3216d9f86a549341..0982d1d52b24d9cbb5da81ad2adaa9f4340e66e5 100644 (file)
@@ -22,7 +22,7 @@
  *
  * @read:              returns the current cycle value
  * @mask:              bitmask for two's complement
- *                     subtraction of non 64 bit counters,
+ *                     subtraction of non-64-bit counters,
  *                     see CYCLECOUNTER_MASK() helper macro
  * @mult:              cycle to nanosecond multiplier
  * @shift:             cycle to nanosecond divisor (power of two)
@@ -35,7 +35,7 @@ struct cyclecounter {
 };
 
 /**
- * struct timecounter - layer above a %struct cyclecounter which counts nanoseconds
+ * struct timecounter - layer above a &struct cyclecounter which counts nanoseconds
  *     Contains the state needed by timecounter_read() to detect
  *     cycle counter wrap around. Initialize with
  *     timecounter_init(). Also used to convert cycle counts into the
@@ -66,6 +66,8 @@ struct timecounter {
  * @cycles:    Cycles
  * @mask:      bit mask for maintaining the 'frac' field
  * @frac:      pointer to storage for the fractional nanoseconds.
+ *
+ * Returns: cycle counter cycles converted to nanoseconds
  */
 static inline u64 cyclecounter_cyc2ns(const struct cyclecounter *cc,
                                      u64 cycles, u64 mask, u64 *frac)
@@ -79,6 +81,7 @@ static inline u64 cyclecounter_cyc2ns(const struct cyclecounter *cc,
 
 /**
  * timecounter_adjtime - Shifts the time of the clock.
+ * @tc:                The &struct timecounter to adjust
  * @delta:     Desired change in nanoseconds.
  */
 static inline void timecounter_adjtime(struct timecounter *tc, s64 delta)
@@ -107,6 +110,8 @@ extern void timecounter_init(struct timecounter *tc,
  *
  * In other words, keeps track of time since the same epoch as
  * the function which generated the initial time stamp.
+ *
+ * Returns: nanoseconds since the initial time stamp
  */
 extern u64 timecounter_read(struct timecounter *tc);
 
@@ -123,6 +128,8 @@ extern u64 timecounter_read(struct timecounter *tc);
  *
  * This allows conversion of cycle counter values which were generated
  * in the past.
+ *
+ * Returns: cycle counter converted to nanoseconds since the initial time stamp
  */
 extern u64 timecounter_cyc2time(const struct timecounter *tc,
                                u64 cycle_tstamp);
index 7e50cbd97f86e3de4160373aaf1aff05179ed9e4..0ea7823b7f31f6c3cd653ddce0204a44e1f61a8f 100644 (file)
@@ -22,14 +22,14 @@ extern int do_sys_settimeofday64(const struct timespec64 *tv,
                                 const struct timezone *tz);
 
 /*
- * ktime_get() family: read the current time in a multitude of ways,
+ * ktime_get() family - read the current time in a multitude of ways.
  *
  * The default time reference is CLOCK_MONOTONIC, starting at
  * boot time but not counting the time spent in suspend.
  * For other references, use the functions with "real", "clocktai",
  * "boottime" and "raw" suffixes.
  *
- * To get the time in a different format, use the ones wit
+ * To get the time in a different format, use the ones with
  * "ns", "ts64" and "seconds" suffix.
  *
  * See Documentation/core-api/timekeeping.rst for more details.
@@ -74,6 +74,8 @@ extern u32 ktime_get_resolution_ns(void);
 
 /**
  * ktime_get_real - get the real (wall-) time in ktime_t format
+ *
+ * Returns: real (wall) time in ktime_t format
  */
 static inline ktime_t ktime_get_real(void)
 {
@@ -86,10 +88,12 @@ static inline ktime_t ktime_get_coarse_real(void)
 }
 
 /**
- * ktime_get_boottime - Returns monotonic time since boot in ktime_t format
+ * ktime_get_boottime - Get monotonic time since boot in ktime_t format
  *
  * This is similar to CLOCK_MONTONIC/ktime_get, but also includes the
  * time spent in suspend.
+ *
+ * Returns: monotonic time since boot in ktime_t format
  */
 static inline ktime_t ktime_get_boottime(void)
 {
@@ -102,7 +106,9 @@ static inline ktime_t ktime_get_coarse_boottime(void)
 }
 
 /**
- * ktime_get_clocktai - Returns the TAI time of day in ktime_t format
+ * ktime_get_clocktai - Get the TAI time of day in ktime_t format
+ *
+ * Returns: the TAI time of day in ktime_t format
  */
 static inline ktime_t ktime_get_clocktai(void)
 {
@@ -144,32 +150,60 @@ static inline u64 ktime_get_coarse_clocktai_ns(void)
 
 /**
  * ktime_mono_to_real - Convert monotonic time to clock realtime
+ * @mono: monotonic time to convert
+ *
+ * Returns: time converted to realtime clock
  */
 static inline ktime_t ktime_mono_to_real(ktime_t mono)
 {
        return ktime_mono_to_any(mono, TK_OFFS_REAL);
 }
 
+/**
+ * ktime_get_ns - Get the current time in nanoseconds
+ *
+ * Returns: current time converted to nanoseconds
+ */
 static inline u64 ktime_get_ns(void)
 {
        return ktime_to_ns(ktime_get());
 }
 
+/**
+ * ktime_get_real_ns - Get the current real/wall time in nanoseconds
+ *
+ * Returns: current real time converted to nanoseconds
+ */
 static inline u64 ktime_get_real_ns(void)
 {
        return ktime_to_ns(ktime_get_real());
 }
 
+/**
+ * ktime_get_boottime_ns - Get the monotonic time since boot in nanoseconds
+ *
+ * Returns: current boottime converted to nanoseconds
+ */
 static inline u64 ktime_get_boottime_ns(void)
 {
        return ktime_to_ns(ktime_get_boottime());
 }
 
+/**
+ * ktime_get_clocktai_ns - Get the current TAI time of day in nanoseconds
+ *
+ * Returns: current TAI time converted to nanoseconds
+ */
 static inline u64 ktime_get_clocktai_ns(void)
 {
        return ktime_to_ns(ktime_get_clocktai());
 }
 
+/**
+ * ktime_get_raw_ns - Get the raw monotonic time in nanoseconds
+ *
+ * Returns: current raw monotonic time converted to nanoseconds
+ */
 static inline u64 ktime_get_raw_ns(void)
 {
        return ktime_to_ns(ktime_get_raw());
@@ -224,8 +258,8 @@ extern bool timekeeping_rtc_skipresume(void);
 
 extern void timekeeping_inject_sleeptime64(const struct timespec64 *delta);
 
-/*
- * struct ktime_timestanps - Simultaneous mono/boot/real timestamps
+/**
+ * struct ktime_timestamps - Simultaneous mono/boot/real timestamps
  * @mono:      Monotonic timestamp
  * @boot:      Boottime timestamp
  * @real:      Realtime timestamp
@@ -242,7 +276,8 @@ struct ktime_timestamps {
  * @cycles:    Clocksource counter value to produce the system times
  * @real:      Realtime system time
  * @raw:       Monotonic raw system time
- * @clock_was_set_seq: The sequence number of clock was set events
+ * @cs_id:     Clocksource ID
+ * @clock_was_set_seq: The sequence number of clock-was-set events
  * @cs_was_changed_seq:        The sequence number of clocksource change events
  */
 struct system_time_snapshot {
index 14a633ba61d6433400a488ff21d73172f3616281..e67ecd1cbc97d6b92994c15b688cdde5ec3c998f 100644 (file)
@@ -22,7 +22,7 @@
 #define __TIMER_LOCKDEP_MAP_INITIALIZER(_kn)
 #endif
 
-/**
+/*
  * @TIMER_DEFERRABLE: A deferrable timer will work normally when the
  * system is busy, but will not cause a CPU to come out of idle just
  * to service it; instead, the timer will be serviced when the CPU
@@ -140,7 +140,7 @@ static inline void destroy_timer_on_stack(struct timer_list *timer) { }
  * or not. Callers must ensure serialization wrt. other operations done
  * to this timer, eg. interrupt contexts, or other CPUs on SMP.
  *
- * return value: 1 if the timer is pending, 0 if not.
+ * Returns: 1 if the timer is pending, 0 if not.
  */
 static inline int timer_pending(const struct timer_list * timer)
 {
@@ -175,6 +175,10 @@ extern int timer_shutdown(struct timer_list *timer);
  * See timer_delete_sync() for detailed explanation.
  *
  * Do not use in new code. Use timer_delete_sync() instead.
+ *
+ * Returns:
+ * * %0        - The timer was not pending
+ * * %1        - The timer was pending and deactivated
  */
 static inline int del_timer_sync(struct timer_list *timer)
 {
@@ -188,6 +192,10 @@ static inline int del_timer_sync(struct timer_list *timer)
  * See timer_delete() for detailed explanation.
  *
  * Do not use in new code. Use timer_delete() instead.
+ *
+ * Returns:
+ * * %0        - The timer was not pending
+ * * %1        - The timer was pending and deactivated
  */
 static inline int del_timer(struct timer_list *timer)
 {
index 9fe95a22abeb7e2fb12d7384a974e5689db61211..eaec5d6caa29d293902f86666c91cceebd6f388c 100644 (file)
@@ -585,6 +585,15 @@ static inline struct sk_buff *bt_skb_sendmmsg(struct sock *sk,
        return skb;
 }
 
+static inline int bt_copy_from_sockptr(void *dst, size_t dst_size,
+                                      sockptr_t src, size_t src_size)
+{
+       if (dst_size > src_size)
+               return -EINVAL;
+
+       return copy_from_sockptr(dst, src, dst_size);
+}
+
 int bt_to_errno(u16 code);
 __u8 bt_status(int err);
 
index a8bebac1e4b28dd3d0195894dc96e18f74184992..957295364a5e3c1aa3bc8a9108a7da02e7b6ee44 100644 (file)
@@ -56,6 +56,9 @@ struct hdac_ext_stream {
        u32 pphcldpl;
        u32 pphcldpu;
 
+       u32 pplcllpl;
+       u32 pplcllpu;
+
        bool decoupled:1;
        bool link_locked:1;
        bool link_prepared;
index 4038dd421150a3f03182be5d1f0a503812029c15..1dc59005d241fbe683fb98d62d86e8c04f3708eb 100644 (file)
@@ -15,7 +15,7 @@
 #ifndef __TAS2781_TLV_H__
 #define __TAS2781_TLV_H__
 
-static const DECLARE_TLV_DB_SCALE(dvc_tlv, -10000, 100, 0);
+static const __maybe_unused DECLARE_TLV_DB_SCALE(dvc_tlv, -10000, 100, 0);
 static const DECLARE_TLV_DB_SCALE(amp_vol_tlv, 1100, 50, 0);
 
 #endif
index 5d5c0b8efff2d44be1c29c66a9c914ed03d5b95f..c71ddb6d46914be3d91758608e92248aea98fcba 100644 (file)
 #include <vdso/time32.h>
 #include <vdso/time64.h>
 
-#ifdef CONFIG_ARM64
-#include <asm/page-def.h>
-#else
-#include <asm/page.h>
-#endif
-
 #ifdef CONFIG_ARCH_HAS_VDSO_DATA
 #include <asm/vdso/data.h>
 #else
@@ -132,7 +126,7 @@ extern struct vdso_data _timens_data[CS_BASES] __attribute__((visibility("hidden
  */
 union vdso_data_store {
        struct vdso_data        data[CS_BASES];
-       u8                      page[PAGE_SIZE];
+       u8                      page[1U << CONFIG_PAGE_SHIFT];
 };
 
 /*
index 3127e0bf7bbd15487ea6b50bd277801f6c01ae7e..a298a3854a8018923dbbd26adbc271550ef9a198 100644 (file)
@@ -367,7 +367,7 @@ static int __init do_name(void)
        if (S_ISREG(mode)) {
                int ml = maybe_link();
                if (ml >= 0) {
-                       int openflags = O_WRONLY|O_CREAT;
+                       int openflags = O_WRONLY|O_CREAT|O_LARGEFILE;
                        if (ml != 1)
                                openflags |= O_TRUNC;
                        wfile = filp_open(collected, openflags, mode);
index 2ca52474d0c3032e44ae410add7b2430218f9c41..881f6230ee59e9675eb98b62adf761ee74823a16 100644 (file)
@@ -487,6 +487,11 @@ static int __init warn_bootconfig(char *str)
 
 early_param("bootconfig", warn_bootconfig);
 
+bool __init cmdline_has_extra_options(void)
+{
+       return extra_command_line || extra_init_args;
+}
+
 /* Change NUL term back to "=", to make "param" the whole string. */
 static void __init repair_env_string(char *param, char *val)
 {
index 5d4b448fdc503822cb97ca5ca93d545ade642db5..4521c2b66b98db3c3affc55c7aeb4a69b8eec0a7 100644 (file)
@@ -147,6 +147,7 @@ static bool io_uring_try_cancel_requests(struct io_ring_ctx *ctx,
 static void io_queue_sqe(struct io_kiocb *req);
 
 struct kmem_cache *req_cachep;
+static struct workqueue_struct *iou_wq __ro_after_init;
 
 static int __read_mostly sysctl_io_uring_disabled;
 static int __read_mostly sysctl_io_uring_group = -1;
@@ -350,7 +351,6 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
 err:
        kfree(ctx->cancel_table.hbs);
        kfree(ctx->cancel_table_locked.hbs);
-       kfree(ctx->io_bl);
        xa_destroy(&ctx->io_bl_xa);
        kfree(ctx);
        return NULL;
@@ -1982,10 +1982,15 @@ fail:
                err = -EBADFD;
                if (!io_file_can_poll(req))
                        goto fail;
-               err = -ECANCELED;
-               if (io_arm_poll_handler(req, issue_flags) != IO_APOLL_OK)
-                       goto fail;
-               return;
+               if (req->file->f_flags & O_NONBLOCK ||
+                   req->file->f_mode & FMODE_NOWAIT) {
+                       err = -ECANCELED;
+                       if (io_arm_poll_handler(req, issue_flags) != IO_APOLL_OK)
+                               goto fail;
+                       return;
+               } else {
+                       req->flags &= ~REQ_F_APOLL_MULTISHOT;
+               }
        }
 
        if (req->flags & REQ_F_FORCE_ASYNC) {
@@ -2926,7 +2931,6 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
        io_napi_free(ctx);
        kfree(ctx->cancel_table.hbs);
        kfree(ctx->cancel_table_locked.hbs);
-       kfree(ctx->io_bl);
        xa_destroy(&ctx->io_bl_xa);
        kfree(ctx);
 }
@@ -3161,7 +3165,7 @@ static __cold void io_ring_ctx_wait_and_kill(struct io_ring_ctx *ctx)
         * noise and overhead, there's no discernable change in runtime
         * over using system_wq.
         */
-       queue_work(system_unbound_wq, &ctx->exit_work);
+       queue_work(iou_wq, &ctx->exit_work);
 }
 
 static int io_uring_release(struct inode *inode, struct file *file)
@@ -3443,14 +3447,15 @@ static void *io_uring_validate_mmap_request(struct file *file,
                ptr = ctx->sq_sqes;
                break;
        case IORING_OFF_PBUF_RING: {
+               struct io_buffer_list *bl;
                unsigned int bgid;
 
                bgid = (offset & ~IORING_OFF_MMAP_MASK) >> IORING_OFF_PBUF_SHIFT;
-               rcu_read_lock();
-               ptr = io_pbuf_get_address(ctx, bgid);
-               rcu_read_unlock();
-               if (!ptr)
-                       return ERR_PTR(-EINVAL);
+               bl = io_pbuf_get_bl(ctx, bgid);
+               if (IS_ERR(bl))
+                       return bl;
+               ptr = bl->buf_ring;
+               io_put_bl(ctx, bl);
                break;
                }
        default:
@@ -4185,6 +4190,8 @@ static int __init io_uring_init(void)
        io_buf_cachep = KMEM_CACHE(io_buffer,
                                          SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT);
 
+       iou_wq = alloc_workqueue("iou_exit", WQ_UNBOUND, 64);
+
 #ifdef CONFIG_SYSCTL
        register_sysctl_init("kernel", kernel_io_uring_disabled_table);
 #endif
index 693c26da4ee1a36b4b0acee0b0cc7a7d0cfde3e6..3aa16e27f5099a426abe1f991c991ffdd9ab379f 100644 (file)
@@ -17,8 +17,6 @@
 
 #define IO_BUFFER_LIST_BUF_PER_PAGE (PAGE_SIZE / sizeof(struct io_uring_buf))
 
-#define BGID_ARRAY     64
-
 /* BIDs are addressed by a 16-bit field in a CQE */
 #define MAX_BIDS_PER_BGID (1 << 16)
 
@@ -40,13 +38,9 @@ struct io_buf_free {
        int                             inuse;
 };
 
-static struct io_buffer_list *__io_buffer_get_list(struct io_ring_ctx *ctx,
-                                                  struct io_buffer_list *bl,
-                                                  unsigned int bgid)
+static inline struct io_buffer_list *__io_buffer_get_list(struct io_ring_ctx *ctx,
+                                                         unsigned int bgid)
 {
-       if (bl && bgid < BGID_ARRAY)
-               return &bl[bgid];
-
        return xa_load(&ctx->io_bl_xa, bgid);
 }
 
@@ -55,7 +49,7 @@ static inline struct io_buffer_list *io_buffer_get_list(struct io_ring_ctx *ctx,
 {
        lockdep_assert_held(&ctx->uring_lock);
 
-       return __io_buffer_get_list(ctx, ctx->io_bl, bgid);
+       return __io_buffer_get_list(ctx, bgid);
 }
 
 static int io_buffer_add_list(struct io_ring_ctx *ctx,
@@ -67,11 +61,7 @@ static int io_buffer_add_list(struct io_ring_ctx *ctx,
         * always under the ->uring_lock, but the RCU lookup from mmap does.
         */
        bl->bgid = bgid;
-       smp_store_release(&bl->is_ready, 1);
-
-       if (bgid < BGID_ARRAY)
-               return 0;
-
+       atomic_set(&bl->refs, 1);
        return xa_err(xa_store(&ctx->io_bl_xa, bgid, bl, GFP_KERNEL));
 }
 
@@ -208,24 +198,6 @@ void __user *io_buffer_select(struct io_kiocb *req, size_t *len,
        return ret;
 }
 
-static __cold int io_init_bl_list(struct io_ring_ctx *ctx)
-{
-       struct io_buffer_list *bl;
-       int i;
-
-       bl = kcalloc(BGID_ARRAY, sizeof(struct io_buffer_list), GFP_KERNEL);
-       if (!bl)
-               return -ENOMEM;
-
-       for (i = 0; i < BGID_ARRAY; i++) {
-               INIT_LIST_HEAD(&bl[i].buf_list);
-               bl[i].bgid = i;
-       }
-
-       smp_store_release(&ctx->io_bl, bl);
-       return 0;
-}
-
 /*
  * Mark the given mapped range as free for reuse
  */
@@ -294,24 +266,24 @@ static int __io_remove_buffers(struct io_ring_ctx *ctx,
        return i;
 }
 
+void io_put_bl(struct io_ring_ctx *ctx, struct io_buffer_list *bl)
+{
+       if (atomic_dec_and_test(&bl->refs)) {
+               __io_remove_buffers(ctx, bl, -1U);
+               kfree_rcu(bl, rcu);
+       }
+}
+
 void io_destroy_buffers(struct io_ring_ctx *ctx)
 {
        struct io_buffer_list *bl;
        struct list_head *item, *tmp;
        struct io_buffer *buf;
        unsigned long index;
-       int i;
-
-       for (i = 0; i < BGID_ARRAY; i++) {
-               if (!ctx->io_bl)
-                       break;
-               __io_remove_buffers(ctx, &ctx->io_bl[i], -1U);
-       }
 
        xa_for_each(&ctx->io_bl_xa, index, bl) {
                xa_erase(&ctx->io_bl_xa, bl->bgid);
-               __io_remove_buffers(ctx, bl, -1U);
-               kfree_rcu(bl, rcu);
+               io_put_bl(ctx, bl);
        }
 
        /*
@@ -489,12 +461,6 @@ int io_provide_buffers(struct io_kiocb *req, unsigned int issue_flags)
 
        io_ring_submit_lock(ctx, issue_flags);
 
-       if (unlikely(p->bgid < BGID_ARRAY && !ctx->io_bl)) {
-               ret = io_init_bl_list(ctx);
-               if (ret)
-                       goto err;
-       }
-
        bl = io_buffer_get_list(ctx, p->bgid);
        if (unlikely(!bl)) {
                bl = kzalloc(sizeof(*bl), GFP_KERNEL_ACCOUNT);
@@ -507,14 +473,9 @@ int io_provide_buffers(struct io_kiocb *req, unsigned int issue_flags)
                if (ret) {
                        /*
                         * Doesn't need rcu free as it was never visible, but
-                        * let's keep it consistent throughout. Also can't
-                        * be a lower indexed array group, as adding one
-                        * where lookup failed cannot happen.
+                        * let's keep it consistent throughout.
                         */
-                       if (p->bgid >= BGID_ARRAY)
-                               kfree_rcu(bl, rcu);
-                       else
-                               WARN_ON_ONCE(1);
+                       kfree_rcu(bl, rcu);
                        goto err;
                }
        }
@@ -679,12 +640,6 @@ int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg)
        if (reg.ring_entries >= 65536)
                return -EINVAL;
 
-       if (unlikely(reg.bgid < BGID_ARRAY && !ctx->io_bl)) {
-               int ret = io_init_bl_list(ctx);
-               if (ret)
-                       return ret;
-       }
-
        bl = io_buffer_get_list(ctx, reg.bgid);
        if (bl) {
                /* if mapped buffer ring OR classic exists, don't allow */
@@ -733,11 +688,8 @@ int io_unregister_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg)
        if (!bl->is_buf_ring)
                return -EINVAL;
 
-       __io_remove_buffers(ctx, bl, -1U);
-       if (bl->bgid >= BGID_ARRAY) {
-               xa_erase(&ctx->io_bl_xa, bl->bgid);
-               kfree_rcu(bl, rcu);
-       }
+       xa_erase(&ctx->io_bl_xa, bl->bgid);
+       io_put_bl(ctx, bl);
        return 0;
 }
 
@@ -767,23 +719,35 @@ int io_register_pbuf_status(struct io_ring_ctx *ctx, void __user *arg)
        return 0;
 }
 
-void *io_pbuf_get_address(struct io_ring_ctx *ctx, unsigned long bgid)
+struct io_buffer_list *io_pbuf_get_bl(struct io_ring_ctx *ctx,
+                                     unsigned long bgid)
 {
        struct io_buffer_list *bl;
+       bool ret;
 
-       bl = __io_buffer_get_list(ctx, smp_load_acquire(&ctx->io_bl), bgid);
-
-       if (!bl || !bl->is_mmap)
-               return NULL;
        /*
-        * Ensure the list is fully setup. Only strictly needed for RCU lookup
-        * via mmap, and in that case only for the array indexed groups. For
-        * the xarray lookups, it's either visible and ready, or not at all.
+        * We have to be a bit careful here - we're inside mmap and cannot grab
+        * the uring_lock. This means the buffer_list could be simultaneously
+        * going away, if someone is trying to be sneaky. Look it up under rcu
+        * so we know it's not going away, and attempt to grab a reference to
+        * it. If the ref is already zero, then fail the mapping. If successful,
+        * the caller will call io_put_bl() to drop the the reference at at the
+        * end. This may then safely free the buffer_list (and drop the pages)
+        * at that point, vm_insert_pages() would've already grabbed the
+        * necessary vma references.
         */
-       if (!smp_load_acquire(&bl->is_ready))
-               return NULL;
-
-       return bl->buf_ring;
+       rcu_read_lock();
+       bl = xa_load(&ctx->io_bl_xa, bgid);
+       /* must be a mmap'able buffer ring and have pages */
+       ret = false;
+       if (bl && bl->is_mmap)
+               ret = atomic_inc_not_zero(&bl->refs);
+       rcu_read_unlock();
+
+       if (ret)
+               return bl;
+
+       return ERR_PTR(-EINVAL);
 }
 
 /*
index 1c7b654ee7263af16e1491934dcbc21744a4a337..df365b8860cf1eeb7eff261e1f5a5a7fc8d9b77f 100644 (file)
@@ -25,12 +25,12 @@ struct io_buffer_list {
        __u16 head;
        __u16 mask;
 
+       atomic_t refs;
+
        /* ring mapped provided buffers */
        __u8 is_buf_ring;
        /* ring mapped provided buffers, but mmap'ed by application */
        __u8 is_mmap;
-       /* bl is visible from an RCU point of view for lookup */
-       __u8 is_ready;
 };
 
 struct io_buffer {
@@ -61,7 +61,9 @@ void __io_put_kbuf(struct io_kiocb *req, unsigned issue_flags);
 
 bool io_kbuf_recycle_legacy(struct io_kiocb *req, unsigned issue_flags);
 
-void *io_pbuf_get_address(struct io_ring_ctx *ctx, unsigned long bgid);
+void io_put_bl(struct io_ring_ctx *ctx, struct io_buffer_list *bl);
+struct io_buffer_list *io_pbuf_get_bl(struct io_ring_ctx *ctx,
+                                     unsigned long bgid);
 
 static inline bool io_kbuf_recycle_ring(struct io_kiocb *req)
 {
index 0585ebcc9773d3349b007e6ff436c0320e26356d..c8d48287439e5a06d19113ecb07f1c05db47dc3f 100644 (file)
@@ -936,6 +936,13 @@ int io_read_mshot(struct io_kiocb *req, unsigned int issue_flags)
 
        ret = __io_read(req, issue_flags);
 
+       /*
+        * If the file doesn't support proper NOWAIT, then disable multishot
+        * and stay in single shot mode.
+        */
+       if (!io_file_supports_nowait(req))
+               req->flags &= ~REQ_F_APOLL_MULTISHOT;
+
        /*
         * If we get -EAGAIN, recycle our buffer and just let normal poll
         * handling arm it.
@@ -955,7 +962,7 @@ int io_read_mshot(struct io_kiocb *req, unsigned int issue_flags)
        /*
         * Any successful return value will keep the multishot read armed.
         */
-       if (ret > 0) {
+       if (ret > 0 && req->flags & REQ_F_APOLL_MULTISHOT) {
                /*
                 * Put our buffer and post a CQE. If we fail to post a CQE, then
                 * jump to the termination path. This request is then done.
index 9d9095e817928658d2c6d54d5da6f4826ff7c6be..65adc815fc6e63027e1b7f0b23c597475a3fea1e 100644 (file)
@@ -1567,10 +1567,17 @@ static int check_kprobe_address_safe(struct kprobe *p,
        jump_label_lock();
        preempt_disable();
 
-       /* Ensure it is not in reserved area nor out of text */
-       if (!(core_kernel_text((unsigned long) p->addr) ||
-           is_module_text_address((unsigned long) p->addr)) ||
-           in_gate_area_no_mm((unsigned long) p->addr) ||
+       /* Ensure the address is in a text area, and find a module if exists. */
+       *probed_mod = NULL;
+       if (!core_kernel_text((unsigned long) p->addr)) {
+               *probed_mod = __module_text_address((unsigned long) p->addr);
+               if (!(*probed_mod)) {
+                       ret = -EINVAL;
+                       goto out;
+               }
+       }
+       /* Ensure it is not in reserved area. */
+       if (in_gate_area_no_mm((unsigned long) p->addr) ||
            within_kprobe_blacklist((unsigned long) p->addr) ||
            jump_label_text_reserved(p->addr, p->addr) ||
            static_call_text_reserved(p->addr, p->addr) ||
@@ -1580,8 +1587,7 @@ static int check_kprobe_address_safe(struct kprobe *p,
                goto out;
        }
 
-       /* Check if 'p' is probing a module. */
-       *probed_mod = __module_text_address((unsigned long) p->addr);
+       /* Get module refcount and reject __init functions for loaded modules. */
        if (*probed_mod) {
                /*
                 * We must hold a refcount of the probed module while updating
index 269e21590df5368bd6fc84d422f44162501a3f31..1331216a9cae749cce5e13b7ff4adcf9ba5fefaa 100644 (file)
@@ -697,6 +697,7 @@ bool tick_nohz_tick_stopped_cpu(int cpu)
 
 /**
  * tick_nohz_update_jiffies - update jiffies when idle was interrupted
+ * @now: current ktime_t
  *
  * Called from interrupt entry when the CPU was idle
  *
@@ -794,7 +795,7 @@ static u64 get_cpu_sleep_time_us(struct tick_sched *ts, ktime_t *sleeptime,
  * This time is measured via accounting rather than sampling,
  * and is as accurate as ktime_get() is.
  *
- * This function returns -1 if NOHZ is not enabled.
+ * Return: -1 if NOHZ is not enabled, else total idle time of the @cpu
  */
 u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time)
 {
@@ -820,7 +821,7 @@ EXPORT_SYMBOL_GPL(get_cpu_idle_time_us);
  * This time is measured via accounting rather than sampling,
  * and is as accurate as ktime_get() is.
  *
- * This function returns -1 if NOHZ is not enabled.
+ * Return: -1 if NOHZ is not enabled, else total iowait time of @cpu
  */
 u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time)
 {
@@ -1287,6 +1288,8 @@ void tick_nohz_irq_exit(void)
 
 /**
  * tick_nohz_idle_got_tick - Check whether or not the tick handler has run
+ *
+ * Return: %true if the tick handler has run, otherwise %false
  */
 bool tick_nohz_idle_got_tick(void)
 {
@@ -1305,6 +1308,8 @@ bool tick_nohz_idle_got_tick(void)
  * stopped, it returns the next hrtimer.
  *
  * Called from power state control code with interrupts disabled
+ *
+ * Return: the next expiration time
  */
 ktime_t tick_nohz_get_next_hrtimer(void)
 {
@@ -1320,6 +1325,8 @@ ktime_t tick_nohz_get_next_hrtimer(void)
  * The return value of this function and/or the value returned by it through the
  * @delta_next pointer can be negative which must be taken into account by its
  * callers.
+ *
+ * Return: the expected length of the current sleep
  */
 ktime_t tick_nohz_get_sleep_length(ktime_t *delta_next)
 {
@@ -1357,8 +1364,11 @@ ktime_t tick_nohz_get_sleep_length(ktime_t *delta_next)
 /**
  * tick_nohz_get_idle_calls_cpu - return the current idle calls counter value
  * for a particular CPU.
+ * @cpu: target CPU number
  *
  * Called from the schedutil frequency scaling governor in scheduler context.
+ *
+ * Return: the current idle calls counter value for @cpu
  */
 unsigned long tick_nohz_get_idle_calls_cpu(int cpu)
 {
@@ -1371,6 +1381,8 @@ unsigned long tick_nohz_get_idle_calls_cpu(int cpu)
  * tick_nohz_get_idle_calls - return the current idle calls counter value
  *
  * Called from the schedutil frequency scaling governor in scheduler context.
+ *
+ * Return: the current idle calls counter value for the current CPU
  */
 unsigned long tick_nohz_get_idle_calls(void)
 {
@@ -1559,7 +1571,7 @@ early_param("skew_tick", skew_tick);
 
 /**
  * tick_setup_sched_timer - setup the tick emulation timer
- * @mode: tick_nohz_mode to setup for
+ * @hrtimer: whether to use the hrtimer or not
  */
 void tick_setup_sched_timer(bool hrtimer)
 {
index e11c4dc65bcb24b3b4200c83d538e5a800ad4fc2..b4a7822f495d3460b636089dc5024cd92b61a467 100644 (file)
@@ -46,8 +46,8 @@ struct tick_device {
  * @next_tick:         Next tick to be fired when in dynticks mode.
  * @idle_jiffies:      jiffies at the entry to idle for idle time accounting
  * @idle_waketime:     Time when the idle was interrupted
+ * @idle_sleeptime_seq:        sequence counter for data consistency
  * @idle_entrytime:    Time when the idle call was entered
- * @nohz_mode:         Mode - one state of tick_nohz_mode
  * @last_jiffies:      Base jiffies snapshot when next event was last computed
  * @timer_expires_base:        Base time clock monotonic for @timer_expires
  * @timer_expires:     Anticipated timer expiration time (in case sched tick is stopped)
index dee29f1f5b75f3c059f831a2cd1c86cd8907fef9..3baf2fbe6848f03efb7418c5d5f10b279b30cf6c 100644 (file)
@@ -64,15 +64,15 @@ EXPORT_SYMBOL(jiffies_64);
 
 /*
  * The timer wheel has LVL_DEPTH array levels. Each level provides an array of
- * LVL_SIZE buckets. Each level is driven by its own clock and therefor each
+ * LVL_SIZE buckets. Each level is driven by its own clock and therefore each
  * level has a different granularity.
  *
- * The level granularity is:           LVL_CLK_DIV ^ lvl
+ * The level granularity is:           LVL_CLK_DIV ^ level
  * The level clock frequency is:       HZ / (LVL_CLK_DIV ^ level)
  *
  * The array level of a newly armed timer depends on the relative expiry
  * time. The farther the expiry time is away the higher the array level and
- * therefor the granularity becomes.
+ * therefore the granularity becomes.
  *
  * Contrary to the original timer wheel implementation, which aims for 'exact'
  * expiry of the timers, this implementation removes the need for recascading
@@ -207,7 +207,7 @@ EXPORT_SYMBOL(jiffies_64);
  * struct timer_base - Per CPU timer base (number of base depends on config)
  * @lock:              Lock protecting the timer_base
  * @running_timer:     When expiring timers, the lock is dropped. To make
- *                     sure not to race agains deleting/modifying a
+ *                     sure not to race against deleting/modifying a
  *                     currently running timer, the pointer is set to the
  *                     timer, which expires at the moment. If no timer is
  *                     running, the pointer is NULL.
@@ -737,7 +737,7 @@ static bool timer_is_static_object(void *addr)
 }
 
 /*
- * fixup_init is called when:
+ * timer_fixup_init is called when:
  * - an active object is initialized
  */
 static bool timer_fixup_init(void *addr, enum debug_obj_state state)
@@ -761,7 +761,7 @@ static void stub_timer(struct timer_list *unused)
 }
 
 /*
- * fixup_activate is called when:
+ * timer_fixup_activate is called when:
  * - an active object is activated
  * - an unknown non-static object is activated
  */
@@ -783,7 +783,7 @@ static bool timer_fixup_activate(void *addr, enum debug_obj_state state)
 }
 
 /*
- * fixup_free is called when:
+ * timer_fixup_free is called when:
  * - an active object is freed
  */
 static bool timer_fixup_free(void *addr, enum debug_obj_state state)
@@ -801,7 +801,7 @@ static bool timer_fixup_free(void *addr, enum debug_obj_state state)
 }
 
 /*
- * fixup_assert_init is called when:
+ * timer_fixup_assert_init is called when:
  * - an untracked/uninit-ed object is found
  */
 static bool timer_fixup_assert_init(void *addr, enum debug_obj_state state)
@@ -914,7 +914,7 @@ static void do_init_timer(struct timer_list *timer,
  * @key: lockdep class key of the fake lock used for tracking timer
  *       sync lock dependencies
  *
- * init_timer_key() must be done to a timer prior calling *any* of the
+ * init_timer_key() must be done to a timer prior to calling *any* of the
  * other timer functions.
  */
 void init_timer_key(struct timer_list *timer,
@@ -1417,7 +1417,7 @@ static int __timer_delete(struct timer_list *timer, bool shutdown)
         * If @shutdown is set then the lock has to be taken whether the
         * timer is pending or not to protect against a concurrent rearm
         * which might hit between the lockless pending check and the lock
-        * aquisition. By taking the lock it is ensured that such a newly
+        * acquisition. By taking the lock it is ensured that such a newly
         * enqueued timer is dequeued and cannot end up with
         * timer->function == NULL in the expiry code.
         *
@@ -2306,7 +2306,7 @@ static inline u64 __get_next_timer_interrupt(unsigned long basej, u64 basem,
 
                /*
                 * When timer base is not set idle, undo the effect of
-                * tmigr_cpu_deactivate() to prevent inconsitent states - active
+                * tmigr_cpu_deactivate() to prevent inconsistent states - active
                 * timer base but inactive timer migration hierarchy.
                 *
                 * When timer base was already marked idle, nothing will be
index c63a0afdcebed5c1e8b7bff161647d194f810ef2..ccba875d2234fe582264e7d802dcb62f4864e4f6 100644 (file)
@@ -751,6 +751,33 @@ bool tmigr_update_events(struct tmigr_group *group, struct tmigr_group *child,
 
                first_childevt = evt = data->evt;
 
+               /*
+                * Walking the hierarchy is required in any case when a
+                * remote expiry was done before. This ensures to not lose
+                * already queued events in non active groups (see section
+                * "Required event and timerqueue update after a remote
+                * expiry" in the documentation at the top).
+                *
+                * The two call sites which are executed without a remote expiry
+                * before, are not prevented from propagating changes through
+                * the hierarchy by the return:
+                *  - When entering this path by tmigr_new_timer(), @evt->ignore
+                *    is never set.
+                *  - tmigr_inactive_up() takes care of the propagation by
+                *    itself and ignores the return value. But an immediate
+                *    return is possible if there is a parent, sparing group
+                *    locking at this level, because the upper walking call to
+                *    the parent will take care about removing this event from
+                *    within the group and update next_expiry accordingly.
+                *
+                * However if there is no parent, ie: the hierarchy has only a
+                * single level so @group is the top level group, make sure the
+                * first event information of the group is updated properly and
+                * also handled properly, so skip this fast return path.
+                */
+               if (evt->ignore && !remote && group->parent)
+                       return true;
+
                raw_spin_lock(&group->lock);
 
                childstate.state = 0;
@@ -762,8 +789,11 @@ bool tmigr_update_events(struct tmigr_group *group, struct tmigr_group *child,
         * queue when the expiry time changed only or when it could be ignored.
         */
        if (timerqueue_node_queued(&evt->nextevt)) {
-               if ((evt->nextevt.expires == nextexp) && !evt->ignore)
+               if ((evt->nextevt.expires == nextexp) && !evt->ignore) {
+                       /* Make sure not to miss a new CPU event with the same expiry */
+                       evt->cpu = first_childevt->cpu;
                        goto check_toplvl;
+               }
 
                if (!timerqueue_del(&group->events, &evt->nextevt))
                        WRITE_ONCE(group->next_expiry, KTIME_MAX);
index af6cc19a200331aa0c37cf2e497384f0b19d8db0..68c97387aa54e2728b79b2cacf23815fcb2f90ca 100644 (file)
@@ -330,7 +330,7 @@ static struct stack_record *depot_pop_free_pool(void **prealloc, size_t size)
        stack = current_pool + pool_offset;
 
        /* Pre-initialize handle once. */
-       stack->handle.pool_index = pool_index + 1;
+       stack->handle.pool_index_plus_1 = pool_index + 1;
        stack->handle.offset = pool_offset >> DEPOT_STACK_ALIGN;
        stack->handle.extra = 0;
        INIT_LIST_HEAD(&stack->hash_list);
@@ -441,7 +441,7 @@ static struct stack_record *depot_fetch_stack(depot_stack_handle_t handle)
        const int pools_num_cached = READ_ONCE(pools_num);
        union handle_parts parts = { .handle = handle };
        void *pool;
-       u32 pool_index = parts.pool_index - 1;
+       u32 pool_index = parts.pool_index_plus_1 - 1;
        size_t offset = parts.offset << DEPOT_STACK_ALIGN;
        struct stack_record *stack;
 
index 276c12140ee26dac37137e400f4c303d34045bb1..c288df9372ede1cbda1371ae92e20a25ae9ebe91 100644 (file)
@@ -134,7 +134,7 @@ static const test_ubsan_fp test_ubsan_array[] = {
 };
 
 /* Excluded because they Oops the module. */
-static const test_ubsan_fp skip_ubsan_array[] = {
+static __used const test_ubsan_fp skip_ubsan_array[] = {
        test_ubsan_divrem_overflow,
 };
 
index 904f70b994985a7682b59a917522f284fd786950..d2155ced45f8f84ef8eac74ba3eda42a67d37102 100644 (file)
@@ -5973,6 +5973,10 @@ int follow_phys(struct vm_area_struct *vma,
                goto out;
        pte = ptep_get(ptep);
 
+       /* Never return PFNs of anon folios in COW mappings. */
+       if (vm_normal_folio(vma, address, pte))
+               goto unlock;
+
        if ((flags & FOLL_WRITE) && !pte_write(pte))
                goto unlock;
 
index 22aa63f4ef6322a71030c5dd163706c83ef5cd8d..68fa001648cc1cb766d8fa4111e88c3fbbf257e8 100644 (file)
@@ -989,6 +989,27 @@ unsigned long vmalloc_nr_pages(void)
        return atomic_long_read(&nr_vmalloc_pages);
 }
 
+static struct vmap_area *__find_vmap_area(unsigned long addr, struct rb_root *root)
+{
+       struct rb_node *n = root->rb_node;
+
+       addr = (unsigned long)kasan_reset_tag((void *)addr);
+
+       while (n) {
+               struct vmap_area *va;
+
+               va = rb_entry(n, struct vmap_area, rb_node);
+               if (addr < va->va_start)
+                       n = n->rb_left;
+               else if (addr >= va->va_end)
+                       n = n->rb_right;
+               else
+                       return va;
+       }
+
+       return NULL;
+}
+
 /* Look up the first VA which satisfies addr < va_end, NULL if none. */
 static struct vmap_area *
 __find_vmap_area_exceed_addr(unsigned long addr, struct rb_root *root)
@@ -1025,47 +1046,39 @@ __find_vmap_area_exceed_addr(unsigned long addr, struct rb_root *root)
 static struct vmap_node *
 find_vmap_area_exceed_addr_lock(unsigned long addr, struct vmap_area **va)
 {
-       struct vmap_node *vn, *va_node = NULL;
-       struct vmap_area *va_lowest;
+       unsigned long va_start_lowest;
+       struct vmap_node *vn;
        int i;
 
-       for (i = 0; i < nr_vmap_nodes; i++) {
+repeat:
+       for (i = 0, va_start_lowest = 0; i < nr_vmap_nodes; i++) {
                vn = &vmap_nodes[i];
 
                spin_lock(&vn->busy.lock);
-               va_lowest = __find_vmap_area_exceed_addr(addr, &vn->busy.root);
-               if (va_lowest) {
-                       if (!va_node || va_lowest->va_start < (*va)->va_start) {
-                               if (va_node)
-                                       spin_unlock(&va_node->busy.lock);
-
-                               *va = va_lowest;
-                               va_node = vn;
-                               continue;
-                       }
-               }
+               *va = __find_vmap_area_exceed_addr(addr, &vn->busy.root);
+
+               if (*va)
+                       if (!va_start_lowest || (*va)->va_start < va_start_lowest)
+                               va_start_lowest = (*va)->va_start;
                spin_unlock(&vn->busy.lock);
        }
 
-       return va_node;
-}
-
-static struct vmap_area *__find_vmap_area(unsigned long addr, struct rb_root *root)
-{
-       struct rb_node *n = root->rb_node;
+       /*
+        * Check if found VA exists, it might have gone away.  In this case we
+        * repeat the search because a VA has been removed concurrently and we
+        * need to proceed to the next one, which is a rare case.
+        */
+       if (va_start_lowest) {
+               vn = addr_to_node(va_start_lowest);
 
-       addr = (unsigned long)kasan_reset_tag((void *)addr);
+               spin_lock(&vn->busy.lock);
+               *va = __find_vmap_area(va_start_lowest, &vn->busy.root);
 
-       while (n) {
-               struct vmap_area *va;
+               if (*va)
+                       return vn;
 
-               va = rb_entry(n, struct vmap_area, rb_node);
-               if (addr < va->va_start)
-                       n = n->rb_left;
-               else if (addr >= va->va_end)
-                       n = n->rb_right;
-               else
-                       return va;
+               spin_unlock(&vn->busy.lock);
+               goto repeat;
        }
 
        return NULL;
@@ -2343,6 +2356,9 @@ struct vmap_area *find_vmap_area(unsigned long addr)
        struct vmap_area *va;
        int i, j;
 
+       if (unlikely(!vmap_initialized))
+               return NULL;
+
        /*
         * An addr_to_node_id(addr) converts an address to a node index
         * where a VA is located. If VA spans several zones and passed
index e265a0ca6bddd40711235c8d7560a6f409a51241..f7e90b4769bba92ef8187b0a96cb310f0c13d5f8 100644 (file)
@@ -1583,7 +1583,7 @@ p9_client_read_once(struct p9_fid *fid, u64 offset, struct iov_iter *to,
                received = rsize;
        }
 
-       p9_debug(P9_DEBUG_9P, "<<< RREAD count %d\n", count);
+       p9_debug(P9_DEBUG_9P, "<<< RREAD count %d\n", received);
 
        if (non_zc) {
                int n = copy_to_iter(dataptr, received, to);
@@ -1609,9 +1609,6 @@ p9_client_write(struct p9_fid *fid, u64 offset, struct iov_iter *from, int *err)
        int total = 0;
        *err = 0;
 
-       p9_debug(P9_DEBUG_9P, ">>> TWRITE fid %d offset %llu count %zd\n",
-                fid->fid, offset, iov_iter_count(from));
-
        while (iov_iter_count(from)) {
                int count = iov_iter_count(from);
                int rsize = fid->iounit;
@@ -1623,6 +1620,9 @@ p9_client_write(struct p9_fid *fid, u64 offset, struct iov_iter *from, int *err)
                if (count < rsize)
                        rsize = count;
 
+               p9_debug(P9_DEBUG_9P, ">>> TWRITE fid %d offset %llu count %d (/%d)\n",
+                        fid->fid, offset, rsize, count);
+
                /* Don't bother zerocopy for small IO (< 1024) */
                if (clnt->trans_mod->zc_request && rsize > 1024) {
                        req = p9_client_zc_rpc(clnt, P9_TWRITE, NULL, from, 0,
@@ -1650,7 +1650,7 @@ p9_client_write(struct p9_fid *fid, u64 offset, struct iov_iter *from, int *err)
                        written = rsize;
                }
 
-               p9_debug(P9_DEBUG_9P, "<<< RWRITE count %d\n", count);
+               p9_debug(P9_DEBUG_9P, "<<< RWRITE count %d\n", written);
 
                p9_req_put(clnt, req);
                iov_iter_revert(from, count - written - iov_iter_count(from));
index 1a3948b8c493eda3aca297896bd8adf7a63d443a..196060dc6138af10e99ad04a76ee36a11f770c65 100644 (file)
@@ -95,7 +95,6 @@ struct p9_poll_wait {
  * @unsent_req_list: accounting for requests that haven't been sent
  * @rreq: read request
  * @wreq: write request
- * @req: current request being processed (if any)
  * @tmp_buf: temporary buffer to read in header
  * @rc: temporary fcall for reading current frame
  * @wpos: write position for current frame
index 00e02138003ecefef75714c950056ced5ccd5fda..efea25eb56ce036364c7325916326b687180bbcf 100644 (file)
@@ -105,8 +105,10 @@ void hci_req_sync_complete(struct hci_dev *hdev, u8 result, u16 opcode,
        if (hdev->req_status == HCI_REQ_PEND) {
                hdev->req_result = result;
                hdev->req_status = HCI_REQ_DONE;
-               if (skb)
+               if (skb) {
+                       kfree_skb(hdev->req_skb);
                        hdev->req_skb = skb_get(skb);
+               }
                wake_up_interruptible(&hdev->req_wait_q);
        }
 }
index 4ee1b976678b2525ff135fb947221b93923f2aee..703b84bd48d5befc51d787bcd6c04dcbcff61675 100644 (file)
@@ -1946,10 +1946,9 @@ static int hci_sock_setsockopt_old(struct socket *sock, int level, int optname,
 
        switch (optname) {
        case HCI_DATA_DIR:
-               if (copy_from_sockptr(&opt, optval, sizeof(opt))) {
-                       err = -EFAULT;
+               err = bt_copy_from_sockptr(&opt, sizeof(opt), optval, len);
+               if (err)
                        break;
-               }
 
                if (opt)
                        hci_pi(sk)->cmsg_mask |= HCI_CMSG_DIR;
@@ -1958,10 +1957,9 @@ static int hci_sock_setsockopt_old(struct socket *sock, int level, int optname,
                break;
 
        case HCI_TIME_STAMP:
-               if (copy_from_sockptr(&opt, optval, sizeof(opt))) {
-                       err = -EFAULT;
+               err = bt_copy_from_sockptr(&opt, sizeof(opt), optval, len);
+               if (err)
                        break;
-               }
 
                if (opt)
                        hci_pi(sk)->cmsg_mask |= HCI_CMSG_TSTAMP;
@@ -1979,11 +1977,9 @@ static int hci_sock_setsockopt_old(struct socket *sock, int level, int optname,
                        uf.event_mask[1] = *((u32 *) f->event_mask + 1);
                }
 
-               len = min_t(unsigned int, len, sizeof(uf));
-               if (copy_from_sockptr(&uf, optval, len)) {
-                       err = -EFAULT;
+               err = bt_copy_from_sockptr(&uf, sizeof(uf), optval, len);
+               if (err)
                        break;
-               }
 
                if (!capable(CAP_NET_RAW)) {
                        uf.type_mask &= hci_sec_filter.type_mask;
@@ -2042,10 +2038,9 @@ static int hci_sock_setsockopt(struct socket *sock, int level, int optname,
                        goto done;
                }
 
-               if (copy_from_sockptr(&opt, optval, sizeof(opt))) {
-                       err = -EFAULT;
+               err = bt_copy_from_sockptr(&opt, sizeof(opt), optval, len);
+               if (err)
                        break;
-               }
 
                hci_pi(sk)->mtu = opt;
                break;
index 8fe02921adf15d4b968be310415bf9383ae3d63d..c5d8799046ccffbf798e6f47ffaef3dddcb364ca 100644 (file)
@@ -2814,8 +2814,8 @@ static int hci_le_set_ext_scan_param_sync(struct hci_dev *hdev, u8 type,
                                if (qos->bcast.in.phy & BT_ISO_PHY_CODED) {
                                        cp->scanning_phys |= LE_SCAN_PHY_CODED;
                                        hci_le_scan_phy_params(phy, type,
-                                                              interval,
-                                                              window);
+                                                              interval * 3,
+                                                              window * 3);
                                        num_phy++;
                                        phy++;
                                }
@@ -2835,7 +2835,7 @@ static int hci_le_set_ext_scan_param_sync(struct hci_dev *hdev, u8 type,
 
        if (scan_coded(hdev)) {
                cp->scanning_phys |= LE_SCAN_PHY_CODED;
-               hci_le_scan_phy_params(phy, type, interval, window);
+               hci_le_scan_phy_params(phy, type, interval * 3, window * 3);
                num_phy++;
                phy++;
        }
index c8793e57f4b547d5bd465b80575143083b867624..ef0cc80b4c0cc1ff4043d05c05fc0c429a64a6c2 100644 (file)
@@ -1451,8 +1451,8 @@ static bool check_ucast_qos(struct bt_iso_qos *qos)
 
 static bool check_bcast_qos(struct bt_iso_qos *qos)
 {
-       if (qos->bcast.sync_factor == 0x00)
-               return false;
+       if (!qos->bcast.sync_factor)
+               qos->bcast.sync_factor = 0x01;
 
        if (qos->bcast.packing > 0x01)
                return false;
@@ -1475,6 +1475,9 @@ static bool check_bcast_qos(struct bt_iso_qos *qos)
        if (qos->bcast.skip > 0x01f3)
                return false;
 
+       if (!qos->bcast.sync_timeout)
+               qos->bcast.sync_timeout = BT_ISO_SYNC_TIMEOUT;
+
        if (qos->bcast.sync_timeout < 0x000a || qos->bcast.sync_timeout > 0x4000)
                return false;
 
@@ -1484,6 +1487,9 @@ static bool check_bcast_qos(struct bt_iso_qos *qos)
        if (qos->bcast.mse > 0x1f)
                return false;
 
+       if (!qos->bcast.timeout)
+               qos->bcast.sync_timeout = BT_ISO_SYNC_TIMEOUT;
+
        if (qos->bcast.timeout < 0x000a || qos->bcast.timeout > 0x4000)
                return false;
 
@@ -1494,7 +1500,7 @@ static int iso_sock_setsockopt(struct socket *sock, int level, int optname,
                               sockptr_t optval, unsigned int optlen)
 {
        struct sock *sk = sock->sk;
-       int len, err = 0;
+       int err = 0;
        struct bt_iso_qos qos = default_qos;
        u32 opt;
 
@@ -1509,10 +1515,9 @@ static int iso_sock_setsockopt(struct socket *sock, int level, int optname,
                        break;
                }
 
-               if (copy_from_sockptr(&opt, optval, sizeof(u32))) {
-                       err = -EFAULT;
+               err = bt_copy_from_sockptr(&opt, sizeof(opt), optval, optlen);
+               if (err)
                        break;
-               }
 
                if (opt)
                        set_bit(BT_SK_DEFER_SETUP, &bt_sk(sk)->flags);
@@ -1521,10 +1526,9 @@ static int iso_sock_setsockopt(struct socket *sock, int level, int optname,
                break;
 
        case BT_PKT_STATUS:
-               if (copy_from_sockptr(&opt, optval, sizeof(u32))) {
-                       err = -EFAULT;
+               err = bt_copy_from_sockptr(&opt, sizeof(opt), optval, optlen);
+               if (err)
                        break;
-               }
 
                if (opt)
                        set_bit(BT_SK_PKT_STATUS, &bt_sk(sk)->flags);
@@ -1539,17 +1543,9 @@ static int iso_sock_setsockopt(struct socket *sock, int level, int optname,
                        break;
                }
 
-               len = min_t(unsigned int, sizeof(qos), optlen);
-
-               if (copy_from_sockptr(&qos, optval, len)) {
-                       err = -EFAULT;
-                       break;
-               }
-
-               if (len == sizeof(qos.ucast) && !check_ucast_qos(&qos)) {
-                       err = -EINVAL;
+               err = bt_copy_from_sockptr(&qos, sizeof(qos), optval, optlen);
+               if (err)
                        break;
-               }
 
                iso_pi(sk)->qos = qos;
                iso_pi(sk)->qos_user_set = true;
@@ -1564,18 +1560,16 @@ static int iso_sock_setsockopt(struct socket *sock, int level, int optname,
                }
 
                if (optlen > sizeof(iso_pi(sk)->base)) {
-                       err = -EOVERFLOW;
+                       err = -EINVAL;
                        break;
                }
 
-               len = min_t(unsigned int, sizeof(iso_pi(sk)->base), optlen);
-
-               if (copy_from_sockptr(iso_pi(sk)->base, optval, len)) {
-                       err = -EFAULT;
+               err = bt_copy_from_sockptr(iso_pi(sk)->base, optlen, optval,
+                                          optlen);
+               if (err)
                        break;
-               }
 
-               iso_pi(sk)->base_len = len;
+               iso_pi(sk)->base_len = optlen;
 
                break;
 
index 467b242d8be071da16bd48d04e1520ce1e1aa8a6..dc089740879363dd0d6d973dcdb4fc05cfc7070a 100644 (file)
@@ -4054,8 +4054,7 @@ static int l2cap_connect_req(struct l2cap_conn *conn,
                return -EPROTO;
 
        hci_dev_lock(hdev);
-       if (hci_dev_test_flag(hdev, HCI_MGMT) &&
-           !test_and_set_bit(HCI_CONN_MGMT_CONNECTED, &hcon->flags))
+       if (hci_dev_test_flag(hdev, HCI_MGMT))
                mgmt_device_connected(hdev, hcon, NULL, 0);
        hci_dev_unlock(hdev);
 
index 4287aa6cc988e3ce34849d1f317be8fd8645832c..e7d810b23082f5ffd8ea4b506366b2684f2e1ece 100644 (file)
@@ -727,7 +727,7 @@ static int l2cap_sock_setsockopt_old(struct socket *sock, int optname,
        struct sock *sk = sock->sk;
        struct l2cap_chan *chan = l2cap_pi(sk)->chan;
        struct l2cap_options opts;
-       int len, err = 0;
+       int err = 0;
        u32 opt;
 
        BT_DBG("sk %p", sk);
@@ -754,11 +754,9 @@ static int l2cap_sock_setsockopt_old(struct socket *sock, int optname,
                opts.max_tx   = chan->max_tx;
                opts.txwin_size = chan->tx_win;
 
-               len = min_t(unsigned int, sizeof(opts), optlen);
-               if (copy_from_sockptr(&opts, optval, len)) {
-                       err = -EFAULT;
+               err = bt_copy_from_sockptr(&opts, sizeof(opts), optval, optlen);
+               if (err)
                        break;
-               }
 
                if (opts.txwin_size > L2CAP_DEFAULT_EXT_WINDOW) {
                        err = -EINVAL;
@@ -801,10 +799,9 @@ static int l2cap_sock_setsockopt_old(struct socket *sock, int optname,
                break;
 
        case L2CAP_LM:
-               if (copy_from_sockptr(&opt, optval, sizeof(u32))) {
-                       err = -EFAULT;
+               err = bt_copy_from_sockptr(&opt, sizeof(opt), optval, optlen);
+               if (err)
                        break;
-               }
 
                if (opt & L2CAP_LM_FIPS) {
                        err = -EINVAL;
@@ -885,7 +882,7 @@ static int l2cap_sock_setsockopt(struct socket *sock, int level, int optname,
        struct bt_security sec;
        struct bt_power pwr;
        struct l2cap_conn *conn;
-       int len, err = 0;
+       int err = 0;
        u32 opt;
        u16 mtu;
        u8 mode;
@@ -911,11 +908,9 @@ static int l2cap_sock_setsockopt(struct socket *sock, int level, int optname,
 
                sec.level = BT_SECURITY_LOW;
 
-               len = min_t(unsigned int, sizeof(sec), optlen);
-               if (copy_from_sockptr(&sec, optval, len)) {
-                       err = -EFAULT;
+               err = bt_copy_from_sockptr(&sec, sizeof(sec), optval, optlen);
+               if (err)
                        break;
-               }
 
                if (sec.level < BT_SECURITY_LOW ||
                    sec.level > BT_SECURITY_FIPS) {
@@ -960,10 +955,9 @@ static int l2cap_sock_setsockopt(struct socket *sock, int level, int optname,
                        break;
                }
 
-               if (copy_from_sockptr(&opt, optval, sizeof(u32))) {
-                       err = -EFAULT;
+               err = bt_copy_from_sockptr(&opt, sizeof(opt), optval, optlen);
+               if (err)
                        break;
-               }
 
                if (opt) {
                        set_bit(BT_SK_DEFER_SETUP, &bt_sk(sk)->flags);
@@ -975,10 +969,9 @@ static int l2cap_sock_setsockopt(struct socket *sock, int level, int optname,
                break;
 
        case BT_FLUSHABLE:
-               if (copy_from_sockptr(&opt, optval, sizeof(u32))) {
-                       err = -EFAULT;
+               err = bt_copy_from_sockptr(&opt, sizeof(opt), optval, optlen);
+               if (err)
                        break;
-               }
 
                if (opt > BT_FLUSHABLE_ON) {
                        err = -EINVAL;
@@ -1010,11 +1003,9 @@ static int l2cap_sock_setsockopt(struct socket *sock, int level, int optname,
 
                pwr.force_active = BT_POWER_FORCE_ACTIVE_ON;
 
-               len = min_t(unsigned int, sizeof(pwr), optlen);
-               if (copy_from_sockptr(&pwr, optval, len)) {
-                       err = -EFAULT;
+               err = bt_copy_from_sockptr(&pwr, sizeof(pwr), optval, optlen);
+               if (err)
                        break;
-               }
 
                if (pwr.force_active)
                        set_bit(FLAG_FORCE_ACTIVE, &chan->flags);
@@ -1023,10 +1014,9 @@ static int l2cap_sock_setsockopt(struct socket *sock, int level, int optname,
                break;
 
        case BT_CHANNEL_POLICY:
-               if (copy_from_sockptr(&opt, optval, sizeof(u32))) {
-                       err = -EFAULT;
+               err = bt_copy_from_sockptr(&opt, sizeof(opt), optval, optlen);
+               if (err)
                        break;
-               }
 
                err = -EOPNOTSUPP;
                break;
@@ -1055,10 +1045,9 @@ static int l2cap_sock_setsockopt(struct socket *sock, int level, int optname,
                        break;
                }
 
-               if (copy_from_sockptr(&mtu, optval, sizeof(u16))) {
-                       err = -EFAULT;
+               err = bt_copy_from_sockptr(&mtu, sizeof(mtu), optval, optlen);
+               if (err)
                        break;
-               }
 
                if (chan->mode == L2CAP_MODE_EXT_FLOWCTL &&
                    sk->sk_state == BT_CONNECTED)
@@ -1086,10 +1075,9 @@ static int l2cap_sock_setsockopt(struct socket *sock, int level, int optname,
                        break;
                }
 
-               if (copy_from_sockptr(&mode, optval, sizeof(u8))) {
-                       err = -EFAULT;
+               err = bt_copy_from_sockptr(&mode, sizeof(mode), optval, optlen);
+               if (err)
                        break;
-               }
 
                BT_DBG("mode %u", mode);
 
index b54e8a530f55a1ff9547a2a5546db34059ebd672..29aa07e9db9d7122bac6ac0c6dfcd76765f11cb8 100644 (file)
@@ -629,7 +629,7 @@ static int rfcomm_sock_setsockopt_old(struct socket *sock, int optname,
 
        switch (optname) {
        case RFCOMM_LM:
-               if (copy_from_sockptr(&opt, optval, sizeof(u32))) {
+               if (bt_copy_from_sockptr(&opt, sizeof(opt), optval, optlen)) {
                        err = -EFAULT;
                        break;
                }
@@ -664,7 +664,6 @@ static int rfcomm_sock_setsockopt(struct socket *sock, int level, int optname,
        struct sock *sk = sock->sk;
        struct bt_security sec;
        int err = 0;
-       size_t len;
        u32 opt;
 
        BT_DBG("sk %p", sk);
@@ -686,11 +685,9 @@ static int rfcomm_sock_setsockopt(struct socket *sock, int level, int optname,
 
                sec.level = BT_SECURITY_LOW;
 
-               len = min_t(unsigned int, sizeof(sec), optlen);
-               if (copy_from_sockptr(&sec, optval, len)) {
-                       err = -EFAULT;
+               err = bt_copy_from_sockptr(&sec, sizeof(sec), optval, optlen);
+               if (err)
                        break;
-               }
 
                if (sec.level > BT_SECURITY_HIGH) {
                        err = -EINVAL;
@@ -706,10 +703,9 @@ static int rfcomm_sock_setsockopt(struct socket *sock, int level, int optname,
                        break;
                }
 
-               if (copy_from_sockptr(&opt, optval, sizeof(u32))) {
-                       err = -EFAULT;
+               err = bt_copy_from_sockptr(&opt, sizeof(opt), optval, optlen);
+               if (err)
                        break;
-               }
 
                if (opt)
                        set_bit(BT_SK_DEFER_SETUP, &bt_sk(sk)->flags);
index 43daf965a01e4ac5c9329150080b00dcd63c7e1c..368e026f4d15ca4711737af941ad30c7b48b827f 100644 (file)
@@ -824,7 +824,7 @@ static int sco_sock_setsockopt(struct socket *sock, int level, int optname,
                               sockptr_t optval, unsigned int optlen)
 {
        struct sock *sk = sock->sk;
-       int len, err = 0;
+       int err = 0;
        struct bt_voice voice;
        u32 opt;
        struct bt_codecs *codecs;
@@ -843,10 +843,9 @@ static int sco_sock_setsockopt(struct socket *sock, int level, int optname,
                        break;
                }
 
-               if (copy_from_sockptr(&opt, optval, sizeof(u32))) {
-                       err = -EFAULT;
+               err = bt_copy_from_sockptr(&opt, sizeof(opt), optval, optlen);
+               if (err)
                        break;
-               }
 
                if (opt)
                        set_bit(BT_SK_DEFER_SETUP, &bt_sk(sk)->flags);
@@ -863,11 +862,10 @@ static int sco_sock_setsockopt(struct socket *sock, int level, int optname,
 
                voice.setting = sco_pi(sk)->setting;
 
-               len = min_t(unsigned int, sizeof(voice), optlen);
-               if (copy_from_sockptr(&voice, optval, len)) {
-                       err = -EFAULT;
+               err = bt_copy_from_sockptr(&voice, sizeof(voice), optval,
+                                          optlen);
+               if (err)
                        break;
-               }
 
                /* Explicitly check for these values */
                if (voice.setting != BT_VOICE_TRANSPARENT &&
@@ -890,10 +888,9 @@ static int sco_sock_setsockopt(struct socket *sock, int level, int optname,
                break;
 
        case BT_PKT_STATUS:
-               if (copy_from_sockptr(&opt, optval, sizeof(u32))) {
-                       err = -EFAULT;
+               err = bt_copy_from_sockptr(&opt, sizeof(opt), optval, optlen);
+               if (err)
                        break;
-               }
 
                if (opt)
                        set_bit(BT_SK_PKT_STATUS, &bt_sk(sk)->flags);
@@ -934,9 +931,9 @@ static int sco_sock_setsockopt(struct socket *sock, int level, int optname,
                        break;
                }
 
-               if (copy_from_sockptr(buffer, optval, optlen)) {
+               err = bt_copy_from_sockptr(buffer, optlen, optval, optlen);
+               if (err) {
                        hci_dev_put(hdev);
-                       err = -EFAULT;
                        break;
                }
 
index b150c9929b12e86219a55c77da480e0c538b3449..14365b20f1c5c09964dd7024060116737f22cb63 100644 (file)
@@ -966,6 +966,8 @@ static int do_replace(struct net *net, sockptr_t arg, unsigned int len)
                return -ENOMEM;
        if (tmp.num_counters == 0)
                return -EINVAL;
+       if ((u64)len < (u64)tmp.size + sizeof(tmp))
+               return -EINVAL;
 
        tmp.name[sizeof(tmp.name)-1] = 0;
 
@@ -1266,6 +1268,8 @@ static int compat_do_replace(struct net *net, sockptr_t arg, unsigned int len)
                return -ENOMEM;
        if (tmp.num_counters == 0)
                return -EINVAL;
+       if ((u64)len < (u64)tmp.size + sizeof(tmp))
+               return -EINVAL;
 
        tmp.name[sizeof(tmp.name)-1] = 0;
 
index 487670759578168c5ff53bce6642898fc41936b3..fe89a056eb06c43743b2d7449e59f4e9360ba223 100644 (file)
@@ -1118,6 +1118,8 @@ do_replace(struct net *net, sockptr_t arg, unsigned int len)
                return -ENOMEM;
        if (tmp.num_counters == 0)
                return -EINVAL;
+       if ((u64)len < (u64)tmp.size + sizeof(tmp))
+               return -EINVAL;
 
        tmp.name[sizeof(tmp.name)-1] = 0;
 
@@ -1504,6 +1506,8 @@ compat_do_replace(struct net *net, sockptr_t arg, unsigned int len)
                return -ENOMEM;
        if (tmp.num_counters == 0)
                return -EINVAL;
+       if ((u64)len < (u64)tmp.size + sizeof(tmp))
+               return -EINVAL;
 
        tmp.name[sizeof(tmp.name)-1] = 0;
 
index 636b360311c5365fba2330f6ca2f7f1b6dd1363e..131f7bb2110d3a08244c6da40ff9be45a2be711b 100644 (file)
@@ -1135,6 +1135,8 @@ do_replace(struct net *net, sockptr_t arg, unsigned int len)
                return -ENOMEM;
        if (tmp.num_counters == 0)
                return -EINVAL;
+       if ((u64)len < (u64)tmp.size + sizeof(tmp))
+               return -EINVAL;
 
        tmp.name[sizeof(tmp.name)-1] = 0;
 
@@ -1513,6 +1515,8 @@ compat_do_replace(struct net *net, sockptr_t arg, unsigned int len)
                return -ENOMEM;
        if (tmp.num_counters == 0)
                return -EINVAL;
+       if ((u64)len < (u64)tmp.size + sizeof(tmp))
+               return -EINVAL;
 
        tmp.name[sizeof(tmp.name)-1] = 0;
 
index 545017a3daa4d6b20255c51c6c0dea73ec32ecfc..6b3f01beb294b99740ae4364acbe31cc92e4a980 100644 (file)
@@ -1206,15 +1206,6 @@ err_noclose:
  * MSG_SPLICE_PAGES is used exclusively to reduce the number of
  * copy operations in this path. Therefore the caller must ensure
  * that the pages backing @xdr are unchanging.
- *
- * Note that the send is non-blocking. The caller has incremented
- * the reference count on each page backing the RPC message, and
- * the network layer will "put" these pages when transmission is
- * complete.
- *
- * This is safe for our RPC services because the memory backing
- * the head and tail components is never kmalloc'd. These always
- * come from pages in the svc_rqst::rq_pages array.
  */
 static int svc_tcp_sendmsg(struct svc_sock *svsk, struct svc_rqst *rqstp,
                           rpc_fraghdr marker, unsigned int *sentp)
@@ -1244,6 +1235,7 @@ static int svc_tcp_sendmsg(struct svc_sock *svsk, struct svc_rqst *rqstp,
        iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, rqstp->rq_bvec,
                      1 + count, sizeof(marker) + rqstp->rq_res.len);
        ret = sock_sendmsg(svsk->sk_sock, &msg);
+       page_frag_free(buf);
        if (ret < 0)
                return ret;
        *sentp += ret;
index fa39b626523851df29275f1448d30a7390e7e0fb..6433a414acf8624a1d98727f4e309b7c040710b9 100644 (file)
@@ -274,11 +274,22 @@ static void __unix_gc(struct work_struct *work)
         * receive queues.  Other, non candidate sockets _can_ be
         * added to queue, so we must make sure only to touch
         * candidates.
+        *
+        * Embryos, though never candidates themselves, affect which
+        * candidates are reachable by the garbage collector.  Before
+        * being added to a listener's queue, an embryo may already
+        * receive data carrying SCM_RIGHTS, potentially making the
+        * passed socket a candidate that is not yet reachable by the
+        * collector.  It becomes reachable once the embryo is
+        * enqueued.  Therefore, we must ensure that no SCM-laden
+        * embryo appears in a (candidate) listener's queue between
+        * consecutive scan_children() calls.
         */
        list_for_each_entry_safe(u, next, &gc_inflight_list, link) {
+               struct sock *sk = &u->sk;
                long total_refs;
 
-               total_refs = file_count(u->sk.sk_socket->file);
+               total_refs = file_count(sk->sk_socket->file);
 
                WARN_ON_ONCE(!u->inflight);
                WARN_ON_ONCE(total_refs < u->inflight);
@@ -286,6 +297,11 @@ static void __unix_gc(struct work_struct *work)
                        list_move_tail(&u->link, &gc_candidates);
                        __set_bit(UNIX_GC_CANDIDATE, &u->gc_flags);
                        __set_bit(UNIX_GC_MAYBE_CYCLE, &u->gc_flags);
+
+                       if (sk->sk_state == TCP_LISTEN) {
+                               unix_state_lock(sk);
+                               unix_state_unlock(sk);
+                       }
                }
        }
 
index c5c2ce113c9232c331c4ebac2ba4384a24424640..d20c47d21ad8352973c93ccefc7b0931e93de545 100644 (file)
@@ -467,6 +467,8 @@ static bool stackleak_gate(void)
                        return false;
                if (STRING_EQUAL(section, ".entry.text"))
                        return false;
+               if (STRING_EQUAL(section, ".head.text"))
+                       return false;
        }
 
        return track_frame_size >= 0;
index 0ba8f0c4cd99a27aaff1d3b43edd306daf0ad857..3a593da09280dca9e5b59dc96c6c2cb27cac6b6b 100644 (file)
@@ -725,7 +725,13 @@ static void __exit amiga_audio_remove(struct platform_device *pdev)
        dmasound_deinit();
 }
 
-static struct platform_driver amiga_audio_driver = {
+/*
+ * amiga_audio_remove() lives in .exit.text. For drivers registered via
+ * module_platform_driver_probe() this is ok because they cannot get unbound at
+ * runtime. So mark the driver struct with __refdata to prevent modpost
+ * triggering a section mismatch warning.
+ */
+static struct platform_driver amiga_audio_driver __refdata = {
        .remove_new = __exit_p(amiga_audio_remove),
        .driver = {
                .name   = "amiga-audio",
index d36234b88fb4219fec68e3f68103d01ee4c224ab..941bfbf812ed305bbfb368771d66134703ba8bea 100644 (file)
@@ -255,7 +255,7 @@ lookup_voices(struct snd_emux *emu, struct snd_emu10k1 *hw,
                /* check if sample is finished playing (non-looping only) */
                if (bp != best + V_OFF && bp != best + V_FREE &&
                    (vp->reg.sample_mode & SNDRV_SFNT_SAMPLE_SINGLESHOT)) {
-                       val = snd_emu10k1_ptr_read(hw, CCCA_CURRADDR, vp->ch) - 64;
+                       val = snd_emu10k1_ptr_read(hw, CCCA_CURRADDR, vp->ch);
                        if (val >= vp->reg.loopstart)
                                bp = best + V_OFF;
                }
@@ -362,7 +362,7 @@ start_voice(struct snd_emux_voice *vp)
 
        map = (hw->silent_page.addr << hw->address_mode) | (hw->address_mode ? MAP_PTI_MASK1 : MAP_PTI_MASK0);
 
-       addr = vp->reg.start + 64;
+       addr = vp->reg.start;
        temp = vp->reg.parm.filterQ;
        ccca = (temp << 28) | addr;
        if (vp->apitch < 0xe400)
@@ -430,9 +430,6 @@ start_voice(struct snd_emux_voice *vp)
                /* Q & current address (Q 4bit value, MSB) */
                CCCA, ccca,
 
-               /* cache */
-               CCR, REG_VAL_PUT(CCR_CACHEINVALIDSIZE, 64),
-
                /* reset volume */
                VTFT, vtarget | vp->ftarget,
                CVCF, vtarget | CVCF_CURRENTFILTER_MASK,
index 72ec872afb8d27de1d2b23288988bf4a5e4f4b88..8fb688e4141485cdf1d29ba73ed25ddc17a2a936 100644 (file)
@@ -108,7 +108,10 @@ static const struct cs35l41_config cs35l41_config_table[] = {
        { "10431F12", 2, INTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, 1, -1, 1000, 4500, 24 },
        { "10431F1F", 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 1, -1, 0, 0, 0, 0 },
        { "10431F62", 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 1, 2, 0, 0, 0, 0 },
+       { "10433A60", 2, INTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 1, 2, 0, 1000, 4500, 24 },
        { "17AA386F", 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, -1, -1, 0, 0, 0 },
+       { "17AA3877", 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, 1, -1, 0, 0, 0 },
+       { "17AA3878", 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, 1, -1, 0, 0, 0 },
        { "17AA38A9", 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, 2, -1, 0, 0, 0 },
        { "17AA38AB", 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, 2, -1, 0, 0, 0 },
        { "17AA38B4", 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, 1, -1, 0, 0, 0 },
@@ -496,7 +499,10 @@ static const struct cs35l41_prop_model cs35l41_prop_model_table[] = {
        { "CSC3551", "10431F12", generic_dsd_config },
        { "CSC3551", "10431F1F", generic_dsd_config },
        { "CSC3551", "10431F62", generic_dsd_config },
+       { "CSC3551", "10433A60", generic_dsd_config },
        { "CSC3551", "17AA386F", generic_dsd_config },
+       { "CSC3551", "17AA3877", generic_dsd_config },
+       { "CSC3551", "17AA3878", generic_dsd_config },
        { "CSC3551", "17AA38A9", generic_dsd_config },
        { "CSC3551", "17AA38AB", generic_dsd_config },
        { "CSC3551", "17AA38B4", generic_dsd_config },
index 13beee807308f1763145cc1f9c1590e427236dc6..40f2f97944d54c916d58279a94f84e6be69b5bba 100644 (file)
@@ -56,10 +56,19 @@ static const struct i2c_device_id cs35l56_hda_i2c_id[] = {
        {}
 };
 
+static const struct acpi_device_id cs35l56_acpi_hda_match[] = {
+       { "CSC3554", 0 },
+       { "CSC3556", 0 },
+       { "CSC3557", 0 },
+       {}
+};
+MODULE_DEVICE_TABLE(acpi, cs35l56_acpi_hda_match);
+
 static struct i2c_driver cs35l56_hda_i2c_driver = {
        .driver = {
-               .name           = "cs35l56-hda",
-               .pm             = &cs35l56_hda_pm_ops,
+               .name             = "cs35l56-hda",
+               .acpi_match_table = cs35l56_acpi_hda_match,
+               .pm               = &cs35l56_hda_pm_ops,
        },
        .id_table       = cs35l56_hda_i2c_id,
        .probe          = cs35l56_hda_i2c_probe,
index a3b2fa76663d3685cf404e59785cdc00c502e493..7f02155fe61e3cd529e4c688bf6d734848baf45d 100644 (file)
@@ -56,10 +56,19 @@ static const struct spi_device_id cs35l56_hda_spi_id[] = {
        {}
 };
 
+static const struct acpi_device_id cs35l56_acpi_hda_match[] = {
+       { "CSC3554", 0 },
+       { "CSC3556", 0 },
+       { "CSC3557", 0 },
+       {}
+};
+MODULE_DEVICE_TABLE(acpi, cs35l56_acpi_hda_match);
+
 static struct spi_driver cs35l56_hda_spi_driver = {
        .driver = {
-               .name           = "cs35l56-hda",
-               .pm             = &cs35l56_hda_pm_ops,
+               .name             = "cs35l56-hda",
+               .acpi_match_table = cs35l56_acpi_hda_match,
+               .pm               = &cs35l56_hda_pm_ops,
        },
        .id_table       = cs35l56_hda_spi_id,
        .probe          = cs35l56_hda_spi_probe,
index a17c36a36aa5375fd8295911a2ffc707cb14263e..cdcb28aa9d7bf028d429aeea0016cec7c6bc0c22 100644 (file)
@@ -6875,11 +6875,38 @@ static void alc287_fixup_legion_16ithg6_speakers(struct hda_codec *cdc, const st
        comp_generic_fixup(cdc, action, "i2c", "CLSA0101", "-%s:00-cs35l41-hda.%d", 2);
 }
 
+static void cs35l56_fixup_i2c_two(struct hda_codec *cdc, const struct hda_fixup *fix, int action)
+{
+       comp_generic_fixup(cdc, action, "i2c", "CSC3556", "-%s:00-cs35l56-hda.%d", 2);
+}
+
+static void cs35l56_fixup_i2c_four(struct hda_codec *cdc, const struct hda_fixup *fix, int action)
+{
+       comp_generic_fixup(cdc, action, "i2c", "CSC3556", "-%s:00-cs35l56-hda.%d", 4);
+}
+
+static void cs35l56_fixup_spi_two(struct hda_codec *cdc, const struct hda_fixup *fix, int action)
+{
+       comp_generic_fixup(cdc, action, "spi", "CSC3556", "-%s:00-cs35l56-hda.%d", 2);
+}
+
 static void cs35l56_fixup_spi_four(struct hda_codec *cdc, const struct hda_fixup *fix, int action)
 {
        comp_generic_fixup(cdc, action, "spi", "CSC3556", "-%s:00-cs35l56-hda.%d", 4);
 }
 
+static void alc285_fixup_asus_ga403u(struct hda_codec *cdc, const struct hda_fixup *fix, int action)
+{
+       /*
+        * The same SSID has been re-used in different hardware, they have
+        * different codecs and the newer GA403U has a ALC285.
+        */
+       if (cdc->core.vendor_id == 0x10ec0285)
+               cs35l56_fixup_i2c_two(cdc, fix, action);
+       else
+               alc_fixup_inv_dmic(cdc, fix, action);
+}
+
 static void tas2781_fixup_i2c(struct hda_codec *cdc,
        const struct hda_fixup *fix, int action)
 {
@@ -7436,6 +7463,10 @@ enum {
        ALC256_FIXUP_ACER_SFG16_MICMUTE_LED,
        ALC256_FIXUP_HEADPHONE_AMP_VOL,
        ALC245_FIXUP_HP_SPECTRE_X360_EU0XXX,
+       ALC285_FIXUP_CS35L56_SPI_2,
+       ALC285_FIXUP_CS35L56_I2C_2,
+       ALC285_FIXUP_CS35L56_I2C_4,
+       ALC285_FIXUP_ASUS_GA403U,
 };
 
 /* A special fixup for Lenovo C940 and Yoga Duet 7;
@@ -9643,6 +9674,22 @@ static const struct hda_fixup alc269_fixups[] = {
                .type = HDA_FIXUP_FUNC,
                .v.func = alc245_fixup_hp_spectre_x360_eu0xxx,
        },
+       [ALC285_FIXUP_CS35L56_SPI_2] = {
+               .type = HDA_FIXUP_FUNC,
+               .v.func = cs35l56_fixup_spi_two,
+       },
+       [ALC285_FIXUP_CS35L56_I2C_2] = {
+               .type = HDA_FIXUP_FUNC,
+               .v.func = cs35l56_fixup_i2c_two,
+       },
+       [ALC285_FIXUP_CS35L56_I2C_4] = {
+               .type = HDA_FIXUP_FUNC,
+               .v.func = cs35l56_fixup_i2c_four,
+       },
+       [ALC285_FIXUP_ASUS_GA403U] = {
+               .type = HDA_FIXUP_FUNC,
+               .v.func = alc285_fixup_asus_ga403u,
+       },
 };
 
 static const struct snd_pci_quirk alc269_fixup_tbl[] = {
@@ -10096,7 +10143,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
        SND_PCI_QUIRK(0x1043, 0x1a83, "ASUS UM5302LA", ALC294_FIXUP_CS35L41_I2C_2),
        SND_PCI_QUIRK(0x1043, 0x1a8f, "ASUS UX582ZS", ALC245_FIXUP_CS35L41_SPI_2),
        SND_PCI_QUIRK(0x1043, 0x1b11, "ASUS UX431DA", ALC294_FIXUP_ASUS_COEF_1B),
-       SND_PCI_QUIRK(0x1043, 0x1b13, "Asus U41SV", ALC269_FIXUP_INV_DMIC),
+       SND_PCI_QUIRK(0x1043, 0x1b13, "ASUS U41SV/GA403U", ALC285_FIXUP_ASUS_GA403U),
        SND_PCI_QUIRK(0x1043, 0x1b93, "ASUS G614JVR/JIR", ALC245_FIXUP_CS35L41_SPI_2),
        SND_PCI_QUIRK(0x1043, 0x1bbd, "ASUS Z550MA", ALC255_FIXUP_ASUS_MIC_NO_PRESENCE),
        SND_PCI_QUIRK(0x1043, 0x1c03, "ASUS UM3406HA", ALC287_FIXUP_CS35L41_I2C_2),
@@ -10104,6 +10151,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
        SND_PCI_QUIRK(0x1043, 0x1c33, "ASUS UX5304MA", ALC245_FIXUP_CS35L41_SPI_2),
        SND_PCI_QUIRK(0x1043, 0x1c43, "ASUS UX8406MA", ALC245_FIXUP_CS35L41_SPI_2),
        SND_PCI_QUIRK(0x1043, 0x1c62, "ASUS GU603", ALC289_FIXUP_ASUS_GA401),
+       SND_PCI_QUIRK(0x1043, 0x1c63, "ASUS GU605M", ALC285_FIXUP_CS35L56_SPI_2),
        SND_PCI_QUIRK(0x1043, 0x1c92, "ASUS ROG Strix G15", ALC285_FIXUP_ASUS_G533Z_PINS),
        SND_PCI_QUIRK(0x1043, 0x1c9f, "ASUS G614JU/JV/JI", ALC285_FIXUP_ASUS_HEADSET_MIC),
        SND_PCI_QUIRK(0x1043, 0x1caf, "ASUS G634JY/JZ/JI/JG", ALC285_FIXUP_ASUS_SPI_REAR_SPEAKERS),
@@ -10115,11 +10163,14 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
        SND_PCI_QUIRK(0x1043, 0x1d42, "ASUS Zephyrus G14 2022", ALC289_FIXUP_ASUS_GA401),
        SND_PCI_QUIRK(0x1043, 0x1d4e, "ASUS TM420", ALC256_FIXUP_ASUS_HPE),
        SND_PCI_QUIRK(0x1043, 0x1da2, "ASUS UP6502ZA/ZD", ALC245_FIXUP_CS35L41_SPI_2),
+       SND_PCI_QUIRK(0x1043, 0x1df3, "ASUS UM5606", ALC285_FIXUP_CS35L56_I2C_4),
        SND_PCI_QUIRK(0x1043, 0x1e02, "ASUS UX3402ZA", ALC245_FIXUP_CS35L41_SPI_2),
        SND_PCI_QUIRK(0x1043, 0x1e11, "ASUS Zephyrus G15", ALC289_FIXUP_ASUS_GA502),
        SND_PCI_QUIRK(0x1043, 0x1e12, "ASUS UM3402", ALC287_FIXUP_CS35L41_I2C_2),
        SND_PCI_QUIRK(0x1043, 0x1e51, "ASUS Zephyrus M15", ALC294_FIXUP_ASUS_GU502_PINS),
        SND_PCI_QUIRK(0x1043, 0x1e5e, "ASUS ROG Strix G513", ALC294_FIXUP_ASUS_G513_PINS),
+       SND_PCI_QUIRK(0x1043, 0x1e63, "ASUS H7606W", ALC285_FIXUP_CS35L56_I2C_2),
+       SND_PCI_QUIRK(0x1043, 0x1e83, "ASUS GA605W", ALC285_FIXUP_CS35L56_I2C_2),
        SND_PCI_QUIRK(0x1043, 0x1e8e, "ASUS Zephyrus G15", ALC289_FIXUP_ASUS_GA401),
        SND_PCI_QUIRK(0x1043, 0x1ee2, "ASUS UM6702RA/RC", ALC287_FIXUP_CS35L41_I2C_2),
        SND_PCI_QUIRK(0x1043, 0x1c52, "ASUS Zephyrus G15 2022", ALC289_FIXUP_ASUS_GA401),
@@ -10133,7 +10184,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
        SND_PCI_QUIRK(0x1043, 0x3a30, "ASUS G814JVR/JIR", ALC245_FIXUP_CS35L41_SPI_2),
        SND_PCI_QUIRK(0x1043, 0x3a40, "ASUS G814JZR", ALC245_FIXUP_CS35L41_SPI_2),
        SND_PCI_QUIRK(0x1043, 0x3a50, "ASUS G834JYR/JZR", ALC245_FIXUP_CS35L41_SPI_2),
-       SND_PCI_QUIRK(0x1043, 0x3a60, "ASUS G634JYR/JZR", ALC245_FIXUP_CS35L41_SPI_2),
+       SND_PCI_QUIRK(0x1043, 0x3a60, "ASUS G634JYR/JZR", ALC285_FIXUP_ASUS_SPI_REAR_SPEAKERS),
        SND_PCI_QUIRK(0x1043, 0x831a, "ASUS P901", ALC269_FIXUP_STEREO_DMIC),
        SND_PCI_QUIRK(0x1043, 0x834a, "ASUS S101", ALC269_FIXUP_STEREO_DMIC),
        SND_PCI_QUIRK(0x1043, 0x8398, "ASUS P1005", ALC269_FIXUP_STEREO_DMIC),
@@ -10159,7 +10210,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
        SND_PCI_QUIRK(0x10ec, 0x1254, "Intel Reference board", ALC295_FIXUP_CHROME_BOOK),
        SND_PCI_QUIRK(0x10ec, 0x12cc, "Intel Reference board", ALC295_FIXUP_CHROME_BOOK),
        SND_PCI_QUIRK(0x10ec, 0x12f6, "Intel Reference board", ALC295_FIXUP_CHROME_BOOK),
-       SND_PCI_QUIRK(0x10f7, 0x8338, "Panasonic CF-SZ6", ALC269_FIXUP_HEADSET_MODE),
+       SND_PCI_QUIRK(0x10f7, 0x8338, "Panasonic CF-SZ6", ALC269_FIXUP_ASPIRE_HEADSET_MIC),
        SND_PCI_QUIRK(0x144d, 0xc109, "Samsung Ativ book 9 (NP900X3G)", ALC269_FIXUP_INV_DMIC),
        SND_PCI_QUIRK(0x144d, 0xc169, "Samsung Notebook 9 Pen (NP930SBE-K01US)", ALC298_FIXUP_SAMSUNG_AMP),
        SND_PCI_QUIRK(0x144d, 0xc176, "Samsung Notebook 9 Pro (NP930MBE-K04US)", ALC298_FIXUP_SAMSUNG_AMP),
@@ -10333,6 +10384,8 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
        SND_PCI_QUIRK(0x17aa, 0x3869, "Lenovo Yoga7 14IAL7", ALC287_FIXUP_YOGA9_14IAP7_BASS_SPK_PIN),
        SND_PCI_QUIRK(0x17aa, 0x386f, "Legion 7i 16IAX7", ALC287_FIXUP_CS35L41_I2C_2),
        SND_PCI_QUIRK(0x17aa, 0x3870, "Lenovo Yoga 7 14ARB7", ALC287_FIXUP_YOGA7_14ARB7_I2C),
+       SND_PCI_QUIRK(0x17aa, 0x3877, "Lenovo Legion 7 Slim 16ARHA7", ALC287_FIXUP_CS35L41_I2C_2),
+       SND_PCI_QUIRK(0x17aa, 0x3878, "Lenovo Legion 7 Slim 16ARHA7", ALC287_FIXUP_CS35L41_I2C_2),
        SND_PCI_QUIRK(0x17aa, 0x387d, "Yoga S780-16 pro Quad AAC", ALC287_FIXUP_TAS2781_I2C),
        SND_PCI_QUIRK(0x17aa, 0x387e, "Yoga S780-16 pro Quad YC", ALC287_FIXUP_TAS2781_I2C),
        SND_PCI_QUIRK(0x17aa, 0x3881, "YB9 dual power mode2 YC", ALC287_FIXUP_TAS2781_I2C),
@@ -10403,6 +10456,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
        SND_PCI_QUIRK(0x1d05, 0x1147, "TongFang GMxTGxx", ALC269_FIXUP_NO_SHUTUP),
        SND_PCI_QUIRK(0x1d05, 0x115c, "TongFang GMxTGxx", ALC269_FIXUP_NO_SHUTUP),
        SND_PCI_QUIRK(0x1d05, 0x121b, "TongFang GMxAGxx", ALC269_FIXUP_NO_SHUTUP),
+       SND_PCI_QUIRK(0x1d05, 0x1387, "TongFang GMxIXxx", ALC2XX_FIXUP_HEADSET_MIC),
        SND_PCI_QUIRK(0x1d72, 0x1602, "RedmiBook", ALC255_FIXUP_XIAOMI_HEADSET_MIC),
        SND_PCI_QUIRK(0x1d72, 0x1701, "XiaomiNotebook Pro", ALC298_FIXUP_DELL1_MIC_NO_PRESENCE),
        SND_PCI_QUIRK(0x1d72, 0x1901, "RedmiBook 14", ALC256_FIXUP_ASUS_HEADSET_MIC),
index 8c8b1dcac6281c1d7448a529ba316d5346fd9030..5f35b90eab8d3f1aa46e6d4ea6bdd8d6d49638a7 100644 (file)
@@ -115,7 +115,10 @@ static int acp_pci_probe(struct pci_dev *pci, const struct pci_device_id *pci_id
                goto unregister_dmic_dev;
        }
 
-       acp_init(chip);
+       ret = acp_init(chip);
+       if (ret)
+               goto unregister_dmic_dev;
+
        res = devm_kcalloc(&pci->dev, num_res, sizeof(struct resource), GFP_KERNEL);
        if (!res) {
                ret = -ENOMEM;
@@ -133,11 +136,9 @@ static int acp_pci_probe(struct pci_dev *pci, const struct pci_device_id *pci_id
                }
        }
 
-       if (flag == FLAG_AMD_LEGACY_ONLY_DMIC) {
-               ret = check_acp_pdm(pci, chip);
-               if (ret < 0)
-                       goto skip_pdev_creation;
-       }
+       ret = check_acp_pdm(pci, chip);
+       if (ret < 0)
+               goto skip_pdev_creation;
 
        chip->flag = flag;
        memset(&pdevinfo, 0, sizeof(pdevinfo));
index 01ef4db5407da52cc04c6813ba07970c24d91541..287ac01a387357beffe2a8f76ab5f3ec15dc7b3d 100644 (file)
@@ -56,6 +56,11 @@ static int _cs_amp_write_cal_coeffs(struct cs_dsp *dsp,
        dev_dbg(dsp->dev, "Calibration: Ambient=%#x, Status=%#x, CalR=%d\n",
                data->calAmbient, data->calStatus, data->calR);
 
+       if (list_empty(&dsp->ctl_list)) {
+               dev_info(dsp->dev, "Calibration disabled due to missing firmware controls\n");
+               return -ENOENT;
+       }
+
        ret = cs_amp_write_cal_coeff(dsp, controls, controls->ambient, data->calAmbient);
        if (ret)
                return ret;
index 860d5cda67bffe83c5c4934497cdf8c1c510922b..94685449f0f48c9b7bd534b1947da07ab26fad53 100644 (file)
@@ -2364,7 +2364,8 @@ static int cs42l43_codec_runtime_resume(struct device *dev)
 
 static int cs42l43_codec_suspend(struct device *dev)
 {
-       struct cs42l43 *cs42l43 = dev_get_drvdata(dev);
+       struct cs42l43_codec *priv = dev_get_drvdata(dev);
+       struct cs42l43 *cs42l43 = priv->core;
 
        disable_irq(cs42l43->irq);
 
@@ -2373,7 +2374,8 @@ static int cs42l43_codec_suspend(struct device *dev)
 
 static int cs42l43_codec_suspend_noirq(struct device *dev)
 {
-       struct cs42l43 *cs42l43 = dev_get_drvdata(dev);
+       struct cs42l43_codec *priv = dev_get_drvdata(dev);
+       struct cs42l43 *cs42l43 = priv->core;
 
        enable_irq(cs42l43->irq);
 
@@ -2382,7 +2384,8 @@ static int cs42l43_codec_suspend_noirq(struct device *dev)
 
 static int cs42l43_codec_resume(struct device *dev)
 {
-       struct cs42l43 *cs42l43 = dev_get_drvdata(dev);
+       struct cs42l43_codec *priv = dev_get_drvdata(dev);
+       struct cs42l43 *cs42l43 = priv->core;
 
        enable_irq(cs42l43->irq);
 
@@ -2391,7 +2394,8 @@ static int cs42l43_codec_resume(struct device *dev)
 
 static int cs42l43_codec_resume_noirq(struct device *dev)
 {
-       struct cs42l43 *cs42l43 = dev_get_drvdata(dev);
+       struct cs42l43_codec *priv = dev_get_drvdata(dev);
+       struct cs42l43 *cs42l43 = priv->core;
 
        disable_irq(cs42l43->irq);
 
index 15289dadafea091d2693149e600d72e0cbb975c0..17bd6b5160772e01d8597767868a9d2472cae276 100644 (file)
@@ -412,9 +412,9 @@ static const struct _coeff_div coeff_div_v3[] = {
        {125, 48000, 6000000, 0x04, 0x04, 0x1F, 0x2D, 0x8A, 0x0A, 0x27, 0x27},
 
        {128, 8000, 1024000, 0x60, 0x00, 0x05, 0x75, 0x8A, 0x1B, 0x1F, 0x7F},
-       {128, 16000, 2048000, 0x20, 0x00, 0x31, 0x35, 0x8A, 0x1B, 0x1F, 0x3F},
-       {128, 44100, 5644800, 0xE0, 0x00, 0x01, 0x2D, 0xCA, 0x0A, 0x1F, 0x1F},
-       {128, 48000, 6144000, 0xE0, 0x00, 0x01, 0x2D, 0xCA, 0x0A, 0x1F, 0x1F},
+       {128, 16000, 2048000, 0x20, 0x00, 0x31, 0x35, 0x08, 0x19, 0x1F, 0x3F},
+       {128, 44100, 5644800, 0xE0, 0x00, 0x01, 0x2D, 0x48, 0x08, 0x1F, 0x1F},
+       {128, 48000, 6144000, 0xE0, 0x00, 0x01, 0x2D, 0x48, 0x08, 0x1F, 0x1F},
        {144, 8000, 1152000, 0x20, 0x00, 0x03, 0x35, 0x8A, 0x1B, 0x23, 0x47},
        {144, 16000, 2304000, 0x20, 0x00, 0x11, 0x35, 0x8A, 0x1B, 0x23, 0x47},
        {192, 8000, 1536000, 0x60, 0x02, 0x0D, 0x75, 0x8A, 0x1B, 0x1F, 0x7F},
@@ -423,10 +423,10 @@ static const struct _coeff_div coeff_div_v3[] = {
 
        {200, 48000, 9600000, 0x04, 0x04, 0x0F, 0x2D, 0xCA, 0x0A, 0x1F, 0x1F},
        {250, 48000, 12000000, 0x04, 0x04, 0x0F, 0x2D, 0xCA, 0x0A, 0x27, 0x27},
-       {256, 8000, 2048000, 0x60, 0x00, 0x31, 0x35, 0x8A, 0x1B, 0x1F, 0x7F},
-       {256, 16000, 4096000, 0x20, 0x00, 0x01, 0x35, 0x8A, 0x1B, 0x1F, 0x3F},
-       {256, 44100, 11289600, 0xE0, 0x00, 0x30, 0x2D, 0xCA, 0x0A, 0x1F, 0x1F},
-       {256, 48000, 12288000, 0xE0, 0x00, 0x30, 0x2D, 0xCA, 0x0A, 0x1F, 0x1F},
+       {256, 8000, 2048000, 0x60, 0x00, 0x31, 0x35, 0x08, 0x19, 0x1F, 0x7F},
+       {256, 16000, 4096000, 0x20, 0x00, 0x01, 0x35, 0x08, 0x19, 0x1F, 0x3F},
+       {256, 44100, 11289600, 0xE0, 0x01, 0x01, 0x2D, 0x48, 0x08, 0x1F, 0x1F},
+       {256, 48000, 12288000, 0xE0, 0x01, 0x01, 0x2D, 0x48, 0x08, 0x1F, 0x1F},
        {288, 8000, 2304000, 0x20, 0x00, 0x01, 0x35, 0x8A, 0x1B, 0x23, 0x47},
        {384, 8000, 3072000, 0x60, 0x02, 0x05, 0x75, 0x8A, 0x1B, 0x1F, 0x7F},
        {384, 16000, 6144000, 0x20, 0x02, 0x03, 0x35, 0x8A, 0x1B, 0x1F, 0x3F},
@@ -435,10 +435,10 @@ static const struct _coeff_div coeff_div_v3[] = {
 
        {400, 48000, 19200000, 0xE4, 0x04, 0x35, 0x6d, 0xCA, 0x0A, 0x1F, 0x1F},
        {500, 48000, 24000000, 0xF8, 0x04, 0x3F, 0x6D, 0xCA, 0x0A, 0x1F, 0x1F},
-       {512, 8000, 4096000, 0x60, 0x00, 0x01, 0x35, 0x8A, 0x1B, 0x1F, 0x7F},
-       {512, 16000, 8192000, 0x20, 0x00, 0x30, 0x35, 0x8A, 0x1B, 0x1F, 0x3F},
-       {512, 44100, 22579200, 0xE0, 0x00, 0x00, 0x2D, 0xCA, 0x0A, 0x1F, 0x1F},
-       {512, 48000, 24576000, 0xE0, 0x00, 0x00, 0x2D, 0xCA, 0x0A, 0x1F, 0x1F},
+       {512, 8000, 4096000, 0x60, 0x00, 0x01, 0x08, 0x19, 0x1B, 0x1F, 0x7F},
+       {512, 16000, 8192000, 0x20, 0x00, 0x30, 0x35, 0x08, 0x19, 0x1F, 0x3F},
+       {512, 44100, 22579200, 0xE0, 0x00, 0x00, 0x2D, 0x48, 0x08, 0x1F, 0x1F},
+       {512, 48000, 24576000, 0xE0, 0x00, 0x00, 0x2D, 0x48, 0x08, 0x1F, 0x1F},
        {768, 8000, 6144000, 0x60, 0x02, 0x11, 0x35, 0x8A, 0x1B, 0x1F, 0x7F},
        {768, 16000, 12288000, 0x20, 0x02, 0x01, 0x35, 0x8A, 0x1B, 0x1F, 0x3F},
        {768, 32000, 24576000, 0xE0, 0x02, 0x30, 0x2D, 0xCA, 0x0A, 0x1F, 0x1F},
@@ -835,7 +835,6 @@ static void es8326_jack_detect_handler(struct work_struct *work)
                        dev_dbg(comp->dev, "Report hp remove event\n");
                        snd_soc_jack_report(es8326->jack, 0, SND_JACK_HEADSET);
                        /* mute adc when mic path switch */
-                       regmap_write(es8326->regmap, ES8326_ADC_SCALE, 0x33);
                        regmap_write(es8326->regmap, ES8326_ADC1_SRC, 0x44);
                        regmap_write(es8326->regmap, ES8326_ADC2_SRC, 0x66);
                        es8326->hp = 0;
@@ -843,6 +842,7 @@ static void es8326_jack_detect_handler(struct work_struct *work)
                regmap_update_bits(es8326->regmap, ES8326_HPDET_TYPE, 0x03, 0x01);
                regmap_write(es8326->regmap, ES8326_SYS_BIAS, 0x0a);
                regmap_update_bits(es8326->regmap, ES8326_HP_DRIVER_REF, 0x0f, 0x03);
+               regmap_write(es8326->regmap, ES8326_INT_SOURCE, ES8326_INT_SRC_PIN9);
                /*
                 * Inverted HPJACK_POL bit to trigger one IRQ to double check HP Removal event
                 */
@@ -865,6 +865,8 @@ static void es8326_jack_detect_handler(struct work_struct *work)
                         * set auto-check mode, then restart jack_detect_work after 400ms.
                         * Don't report jack status.
                         */
+                       regmap_write(es8326->regmap, ES8326_INT_SOURCE,
+                                       (ES8326_INT_SRC_PIN9 | ES8326_INT_SRC_BUTTON));
                        regmap_update_bits(es8326->regmap, ES8326_HPDET_TYPE, 0x03, 0x01);
                        es8326_enable_micbias(es8326->component);
                        usleep_range(50000, 70000);
@@ -891,7 +893,6 @@ static void es8326_jack_detect_handler(struct work_struct *work)
                        snd_soc_jack_report(es8326->jack,
                                        SND_JACK_HEADSET, SND_JACK_HEADSET);
 
-                       regmap_write(es8326->regmap, ES8326_ADC_SCALE, 0x33);
                        regmap_update_bits(es8326->regmap, ES8326_PGA_PDN,
                                        0x08, 0x08);
                        regmap_update_bits(es8326->regmap, ES8326_PGAGAIN,
@@ -987,7 +988,7 @@ static int es8326_resume(struct snd_soc_component *component)
        regmap_write(es8326->regmap, ES8326_VMIDSEL, 0x0E);
        regmap_write(es8326->regmap, ES8326_ANA_LP, 0xf0);
        usleep_range(10000, 15000);
-       regmap_write(es8326->regmap, ES8326_HPJACK_TIMER, 0xe9);
+       regmap_write(es8326->regmap, ES8326_HPJACK_TIMER, 0xd9);
        regmap_write(es8326->regmap, ES8326_ANA_MICBIAS, 0xcb);
        /* set headphone default type and detect pin */
        regmap_write(es8326->regmap, ES8326_HPDET_TYPE, 0x83);
@@ -1038,8 +1039,7 @@ static int es8326_resume(struct snd_soc_component *component)
        es8326_enable_micbias(es8326->component);
        usleep_range(50000, 70000);
        regmap_update_bits(es8326->regmap, ES8326_HPDET_TYPE, 0x03, 0x00);
-       regmap_write(es8326->regmap, ES8326_INT_SOURCE,
-                   (ES8326_INT_SRC_PIN9 | ES8326_INT_SRC_BUTTON));
+       regmap_write(es8326->regmap, ES8326_INT_SOURCE, ES8326_INT_SRC_PIN9);
        regmap_write(es8326->regmap, ES8326_INTOUT_IO,
                     es8326->interrupt_clk);
        regmap_write(es8326->regmap, ES8326_SDINOUT1_IO,
@@ -1060,6 +1060,8 @@ static int es8326_resume(struct snd_soc_component *component)
        es8326->hp = 0;
        es8326->hpl_vol = 0x03;
        es8326->hpr_vol = 0x03;
+
+       es8326_irq(es8326->irq, es8326);
        return 0;
 }
 
@@ -1070,6 +1072,9 @@ static int es8326_suspend(struct snd_soc_component *component)
        cancel_delayed_work_sync(&es8326->jack_detect_work);
        es8326_disable_micbias(component);
        es8326->calibrated = false;
+       regmap_write(es8326->regmap, ES8326_CLK_MUX, 0x2d);
+       regmap_write(es8326->regmap, ES8326_DAC2HPMIX, 0x00);
+       regmap_write(es8326->regmap, ES8326_ANA_PDN, 0x3b);
        regmap_write(es8326->regmap, ES8326_CLK_CTL, ES8326_CLK_OFF);
        regcache_cache_only(es8326->regmap, true);
        regcache_mark_dirty(es8326->regmap);
index ee12caef810532380cdf1b5d6b0b204afef78e63..c3e52e7bdef57de0377cb7b467bf6fd8fd62b8c9 100644 (file)
 #define ES8326_MUTE (3 << 0)
 
 /* ES8326_CLK_CTL */
-#define ES8326_CLK_ON (0x7e << 0)
+#define ES8326_CLK_ON (0x7f << 0)
 #define ES8326_CLK_OFF (0 << 0)
 
 /* ES8326_CLK_INV */
index 47511f70119ae3b1d810ce8561d6026ccbbd98da..0b3bf920bcab2307c0107387e0ad728552bb6b9c 100644 (file)
@@ -537,7 +537,7 @@ static int rt1316_sdw_hw_params(struct snd_pcm_substream *substream,
        retval = sdw_stream_add_slave(rt1316->sdw_slave, &stream_config,
                                &port_config, 1, sdw_stream);
        if (retval) {
-               dev_err(dai->dev, "Unable to configure port\n");
+               dev_err(dai->dev, "%s: Unable to configure port\n", __func__);
                return retval;
        }
 
@@ -577,12 +577,12 @@ static int rt1316_sdw_parse_dt(struct rt1316_sdw_priv *rt1316, struct device *de
        if (rt1316->bq_params_cnt) {
                rt1316->bq_params = devm_kzalloc(dev, rt1316->bq_params_cnt, GFP_KERNEL);
                if (!rt1316->bq_params) {
-                       dev_err(dev, "Could not allocate bq_params memory\n");
+                       dev_err(dev, "%s: Could not allocate bq_params memory\n", __func__);
                        ret = -ENOMEM;
                } else {
                        ret = device_property_read_u8_array(dev, "realtek,bq-params", rt1316->bq_params, rt1316->bq_params_cnt);
                        if (ret < 0)
-                               dev_err(dev, "Could not read list of realtek,bq-params\n");
+                               dev_err(dev, "%s: Could not read list of realtek,bq-params\n", __func__);
                }
        }
 
@@ -759,7 +759,7 @@ static int __maybe_unused rt1316_dev_resume(struct device *dev)
        time = wait_for_completion_timeout(&slave->initialization_complete,
                                msecs_to_jiffies(RT1316_PROBE_TIMEOUT));
        if (!time) {
-               dev_err(&slave->dev, "Initialization not complete, timed out\n");
+               dev_err(&slave->dev, "%s: Initialization not complete, timed out\n", __func__);
                sdw_show_ping_status(slave->bus, true);
 
                return -ETIMEDOUT;
index ff364bde4a084943d78da479dc876f7b328eb02b..462c9a4b1be5ddb27c078b4c49e9c2ee3737e467 100644 (file)
@@ -606,7 +606,7 @@ static int rt1318_sdw_hw_params(struct snd_pcm_substream *substream,
        retval = sdw_stream_add_slave(rt1318->sdw_slave, &stream_config,
                                &port_config, 1, sdw_stream);
        if (retval) {
-               dev_err(dai->dev, "Unable to configure port\n");
+               dev_err(dai->dev, "%s: Unable to configure port\n", __func__);
                return retval;
        }
 
@@ -631,8 +631,8 @@ static int rt1318_sdw_hw_params(struct snd_pcm_substream *substream,
                sampling_rate = RT1318_SDCA_RATE_192000HZ;
                break;
        default:
-               dev_err(component->dev, "Rate %d is not supported\n",
-                       params_rate(params));
+               dev_err(component->dev, "%s: Rate %d is not supported\n",
+                       __func__, params_rate(params));
                return -EINVAL;
        }
 
@@ -835,7 +835,7 @@ static int __maybe_unused rt1318_dev_resume(struct device *dev)
        time = wait_for_completion_timeout(&slave->initialization_complete,
                                msecs_to_jiffies(RT1318_PROBE_TIMEOUT));
        if (!time) {
-               dev_err(&slave->dev, "Initialization not complete, timed out\n");
+               dev_err(&slave->dev, "%s: Initialization not complete, timed out\n", __func__);
                return -ETIMEDOUT;
        }
 
index e67c2e19cb1a7291170ada3b69cbdda4aadb8b6c..f9ee42c13dbac34afd0f79ff5299050106d82357 100644 (file)
@@ -132,7 +132,7 @@ static int rt5682_sdw_hw_params(struct snd_pcm_substream *substream,
        retval = sdw_stream_add_slave(rt5682->slave, &stream_config,
                                      &port_config, 1, sdw_stream);
        if (retval) {
-               dev_err(dai->dev, "Unable to configure port\n");
+               dev_err(dai->dev, "%s: Unable to configure port\n", __func__);
                return retval;
        }
 
@@ -315,8 +315,8 @@ static int rt5682_sdw_init(struct device *dev, struct regmap *regmap,
                                          &rt5682_sdw_indirect_regmap);
        if (IS_ERR(rt5682->regmap)) {
                ret = PTR_ERR(rt5682->regmap);
-               dev_err(dev, "Failed to allocate register map: %d\n",
-                       ret);
+               dev_err(dev, "%s: Failed to allocate register map: %d\n",
+                       __func__, ret);
                return ret;
        }
 
@@ -400,7 +400,7 @@ static int rt5682_io_init(struct device *dev, struct sdw_slave *slave)
        }
 
        if (val != DEVICE_ID) {
-               dev_err(dev, "Device with ID register %x is not rt5682\n", val);
+               dev_err(dev, "%s: Device with ID register %x is not rt5682\n", __func__, val);
                ret = -ENODEV;
                goto err_nodev;
        }
@@ -648,7 +648,7 @@ static int rt5682_bus_config(struct sdw_slave *slave,
 
        ret = rt5682_clock_config(&slave->dev);
        if (ret < 0)
-               dev_err(&slave->dev, "Invalid clk config");
+               dev_err(&slave->dev, "%s: Invalid clk config", __func__);
 
        return ret;
 }
@@ -763,19 +763,19 @@ static int __maybe_unused rt5682_dev_resume(struct device *dev)
                return 0;
 
        if (!slave->unattach_request) {
+               mutex_lock(&rt5682->disable_irq_lock);
                if (rt5682->disable_irq == true) {
-                       mutex_lock(&rt5682->disable_irq_lock);
                        sdw_write_no_pm(slave, SDW_SCP_INTMASK1, SDW_SCP_INT1_IMPL_DEF);
                        rt5682->disable_irq = false;
-                       mutex_unlock(&rt5682->disable_irq_lock);
                }
+               mutex_unlock(&rt5682->disable_irq_lock);
                goto regmap_sync;
        }
 
        time = wait_for_completion_timeout(&slave->initialization_complete,
                                msecs_to_jiffies(RT5682_PROBE_TIMEOUT));
        if (!time) {
-               dev_err(&slave->dev, "Initialization not complete, timed out\n");
+               dev_err(&slave->dev, "%s: Initialization not complete, timed out\n", __func__);
                sdw_show_ping_status(slave->bus, true);
 
                return -ETIMEDOUT;
index 0ebf344a1b6094a38a4c38c6817b8cf0c9242f48..434b926f96c8376c1c5b8b73c37b01b2d64641fe 100644 (file)
@@ -37,8 +37,8 @@ static int rt700_index_write(struct regmap *regmap,
 
        ret = regmap_write(regmap, addr, value);
        if (ret < 0)
-               pr_err("Failed to set private value: %06x <= %04x ret=%d\n",
-                       addr, value, ret);
+               pr_err("%s: Failed to set private value: %06x <= %04x ret=%d\n",
+                      __func__, addr, value, ret);
 
        return ret;
 }
@@ -52,8 +52,8 @@ static int rt700_index_read(struct regmap *regmap,
        *value = 0;
        ret = regmap_read(regmap, addr, value);
        if (ret < 0)
-               pr_err("Failed to get private value: %06x => %04x ret=%d\n",
-                       addr, *value, ret);
+               pr_err("%s: Failed to get private value: %06x => %04x ret=%d\n",
+                      __func__, addr, *value, ret);
 
        return ret;
 }
@@ -930,14 +930,14 @@ static int rt700_pcm_hw_params(struct snd_pcm_substream *substream,
                port_config.num += 2;
                break;
        default:
-               dev_err(component->dev, "Invalid DAI id %d\n", dai->id);
+               dev_err(component->dev, "%s: Invalid DAI id %d\n", __func__, dai->id);
                return -EINVAL;
        }
 
        retval = sdw_stream_add_slave(rt700->slave, &stream_config,
                                        &port_config, 1, sdw_stream);
        if (retval) {
-               dev_err(dai->dev, "Unable to configure port\n");
+               dev_err(dai->dev, "%s: Unable to configure port\n", __func__);
                return retval;
        }
 
@@ -945,8 +945,8 @@ static int rt700_pcm_hw_params(struct snd_pcm_substream *substream,
                /* bit 3:0 Number of Channel */
                val |= (params_channels(params) - 1);
        } else {
-               dev_err(component->dev, "Unsupported channels %d\n",
-                       params_channels(params));
+               dev_err(component->dev, "%s: Unsupported channels %d\n",
+                       __func__, params_channels(params));
                return -EINVAL;
        }
 
index 935e597022d3242187b378107e302a36bedd17f5..2636c2eea4bc8be6af732d3c2d5f03b8d78be22f 100644 (file)
@@ -438,20 +438,20 @@ static int __maybe_unused rt711_sdca_dev_resume(struct device *dev)
                return 0;
 
        if (!slave->unattach_request) {
+               mutex_lock(&rt711->disable_irq_lock);
                if (rt711->disable_irq == true) {
-                       mutex_lock(&rt711->disable_irq_lock);
                        sdw_write_no_pm(slave, SDW_SCP_SDCA_INTMASK1, SDW_SCP_SDCA_INTMASK_SDCA_0);
                        sdw_write_no_pm(slave, SDW_SCP_SDCA_INTMASK2, SDW_SCP_SDCA_INTMASK_SDCA_8);
                        rt711->disable_irq = false;
-                       mutex_unlock(&rt711->disable_irq_lock);
                }
+               mutex_unlock(&rt711->disable_irq_lock);
                goto regmap_sync;
        }
 
        time = wait_for_completion_timeout(&slave->initialization_complete,
                                msecs_to_jiffies(RT711_PROBE_TIMEOUT));
        if (!time) {
-               dev_err(&slave->dev, "Initialization not complete, timed out\n");
+               dev_err(&slave->dev, "%s: Initialization not complete, timed out\n", __func__);
                sdw_show_ping_status(slave->bus, true);
 
                return -ETIMEDOUT;
index 447154cb60104d31bb66ef268d7b60c513e273e3..1e8dbfc3ecd969be3a87cb5f00aed853a56e2a41 100644 (file)
@@ -36,8 +36,8 @@ static int rt711_sdca_index_write(struct rt711_sdca_priv *rt711,
        ret = regmap_write(regmap, addr, value);
        if (ret < 0)
                dev_err(&rt711->slave->dev,
-                       "Failed to set private value: %06x <= %04x ret=%d\n",
-                       addr, value, ret);
+                       "%s: Failed to set private value: %06x <= %04x ret=%d\n",
+                       __func__, addr, value, ret);
 
        return ret;
 }
@@ -52,8 +52,8 @@ static int rt711_sdca_index_read(struct rt711_sdca_priv *rt711,
        ret = regmap_read(regmap, addr, value);
        if (ret < 0)
                dev_err(&rt711->slave->dev,
-                       "Failed to get private value: %06x => %04x ret=%d\n",
-                       addr, *value, ret);
+                       "%s: Failed to get private value: %06x => %04x ret=%d\n",
+                       __func__, addr, *value, ret);
 
        return ret;
 }
@@ -1293,13 +1293,13 @@ static int rt711_sdca_pcm_hw_params(struct snd_pcm_substream *substream,
        retval = sdw_stream_add_slave(rt711->slave, &stream_config,
                                        &port_config, 1, sdw_stream);
        if (retval) {
-               dev_err(dai->dev, "Unable to configure port\n");
+               dev_err(dai->dev, "%s: Unable to configure port\n", __func__);
                return retval;
        }
 
        if (params_channels(params) > 16) {
-               dev_err(component->dev, "Unsupported channels %d\n",
-                       params_channels(params));
+               dev_err(component->dev, "%s: Unsupported channels %d\n",
+                       __func__, params_channels(params));
                return -EINVAL;
        }
 
@@ -1318,8 +1318,8 @@ static int rt711_sdca_pcm_hw_params(struct snd_pcm_substream *substream,
                sampling_rate = RT711_SDCA_RATE_192000HZ;
                break;
        default:
-               dev_err(component->dev, "Rate %d is not supported\n",
-                       params_rate(params));
+               dev_err(component->dev, "%s: Rate %d is not supported\n",
+                       __func__, params_rate(params));
                return -EINVAL;
        }
 
index 3f5773310ae8cc3b5d94f76aa481724ac35bad0a..0d3b43dd22e63d2343b0ce882166ca7bedc7b67c 100644 (file)
@@ -408,7 +408,7 @@ static int rt711_bus_config(struct sdw_slave *slave,
 
        ret = rt711_clock_config(&slave->dev);
        if (ret < 0)
-               dev_err(&slave->dev, "Invalid clk config");
+               dev_err(&slave->dev, "%s: Invalid clk config", __func__);
 
        return ret;
 }
@@ -536,19 +536,19 @@ static int __maybe_unused rt711_dev_resume(struct device *dev)
                return 0;
 
        if (!slave->unattach_request) {
+               mutex_lock(&rt711->disable_irq_lock);
                if (rt711->disable_irq == true) {
-                       mutex_lock(&rt711->disable_irq_lock);
                        sdw_write_no_pm(slave, SDW_SCP_INTMASK1, SDW_SCP_INT1_IMPL_DEF);
                        rt711->disable_irq = false;
-                       mutex_unlock(&rt711->disable_irq_lock);
                }
+               mutex_unlock(&rt711->disable_irq_lock);
                goto regmap_sync;
        }
 
        time = wait_for_completion_timeout(&slave->initialization_complete,
                                msecs_to_jiffies(RT711_PROBE_TIMEOUT));
        if (!time) {
-               dev_err(&slave->dev, "Initialization not complete, timed out\n");
+               dev_err(&slave->dev, "%s: Initialization not complete, timed out\n", __func__);
                return -ETIMEDOUT;
        }
 
index 66eaed13b0d6a06ff1a649be6924e16850e65997..5446f9506a16722e8a43571631d109fa27c9fe65 100644 (file)
@@ -37,8 +37,8 @@ static int rt711_index_write(struct regmap *regmap,
 
        ret = regmap_write(regmap, addr, value);
        if (ret < 0)
-               pr_err("Failed to set private value: %06x <= %04x ret=%d\n",
-                       addr, value, ret);
+               pr_err("%s: Failed to set private value: %06x <= %04x ret=%d\n",
+                      __func__, addr, value, ret);
 
        return ret;
 }
@@ -52,8 +52,8 @@ static int rt711_index_read(struct regmap *regmap,
        *value = 0;
        ret = regmap_read(regmap, addr, value);
        if (ret < 0)
-               pr_err("Failed to get private value: %06x => %04x ret=%d\n",
-                       addr, *value, ret);
+               pr_err("%s: Failed to get private value: %06x => %04x ret=%d\n",
+                      __func__, addr, *value, ret);
 
        return ret;
 }
@@ -428,7 +428,7 @@ static void rt711_jack_init(struct rt711_priv *rt711)
                                RT711_HP_JD_FINAL_RESULT_CTL_JD12);
                        break;
                default:
-                       dev_warn(rt711->component->dev, "Wrong JD source\n");
+                       dev_warn(rt711->component->dev, "%s: Wrong JD source\n", __func__);
                        break;
                }
 
@@ -1020,7 +1020,7 @@ static int rt711_pcm_hw_params(struct snd_pcm_substream *substream,
        retval = sdw_stream_add_slave(rt711->slave, &stream_config,
                                        &port_config, 1, sdw_stream);
        if (retval) {
-               dev_err(dai->dev, "Unable to configure port\n");
+               dev_err(dai->dev, "%s: Unable to configure port\n", __func__);
                return retval;
        }
 
@@ -1028,8 +1028,8 @@ static int rt711_pcm_hw_params(struct snd_pcm_substream *substream,
                /* bit 3:0 Number of Channel */
                val |= (params_channels(params) - 1);
        } else {
-               dev_err(component->dev, "Unsupported channels %d\n",
-                       params_channels(params));
+               dev_err(component->dev, "%s: Unsupported channels %d\n",
+                       __func__, params_channels(params));
                return -EINVAL;
        }
 
index 0926b26619bd45b69f5b0b4b5508d677ca207766..012b79e72cf6b64e1e5c2837aaf034237cddcaa6 100644 (file)
@@ -139,8 +139,8 @@ static int rt712_sdca_dmic_index_write(struct rt712_sdca_dmic_priv *rt712,
        ret = regmap_write(regmap, addr, value);
        if (ret < 0)
                dev_err(&rt712->slave->dev,
-                       "Failed to set private value: %06x <= %04x ret=%d\n",
-                       addr, value, ret);
+                       "%s: Failed to set private value: %06x <= %04x ret=%d\n",
+                       __func__, addr, value, ret);
 
        return ret;
 }
@@ -155,8 +155,8 @@ static int rt712_sdca_dmic_index_read(struct rt712_sdca_dmic_priv *rt712,
        ret = regmap_read(regmap, addr, value);
        if (ret < 0)
                dev_err(&rt712->slave->dev,
-                       "Failed to get private value: %06x => %04x ret=%d\n",
-                       addr, *value, ret);
+                       "%s: Failed to get private value: %06x => %04x ret=%d\n",
+                       __func__, addr, *value, ret);
 
        return ret;
 }
@@ -317,7 +317,8 @@ static int rt712_sdca_dmic_set_gain_put(struct snd_kcontrol *kcontrol,
        for (i = 0; i < p->count; i++) {
                err = regmap_write(rt712->mbq_regmap, p->reg_base + i, gain_val[i]);
                if (err < 0)
-                       dev_err(&rt712->slave->dev, "0x%08x can't be set\n", p->reg_base + i);
+                       dev_err(&rt712->slave->dev, "%s: 0x%08x can't be set\n",
+                               __func__, p->reg_base + i);
        }
 
        return changed;
@@ -667,13 +668,13 @@ static int rt712_sdca_dmic_hw_params(struct snd_pcm_substream *substream,
        retval = sdw_stream_add_slave(rt712->slave, &stream_config,
                                        &port_config, 1, sdw_stream);
        if (retval) {
-               dev_err(dai->dev, "Unable to configure port\n");
+               dev_err(dai->dev, "%s: Unable to configure port\n", __func__);
                return retval;
        }
 
        if (params_channels(params) > 4) {
-               dev_err(component->dev, "Unsupported channels %d\n",
-                       params_channels(params));
+               dev_err(component->dev, "%s: Unsupported channels %d\n",
+                       __func__, params_channels(params));
                return -EINVAL;
        }
 
@@ -698,8 +699,8 @@ static int rt712_sdca_dmic_hw_params(struct snd_pcm_substream *substream,
                sampling_rate = RT712_SDCA_RATE_192000HZ;
                break;
        default:
-               dev_err(component->dev, "Rate %d is not supported\n",
-                       params_rate(params));
+               dev_err(component->dev, "%s: Rate %d is not supported\n",
+                       __func__, params_rate(params));
                return -EINVAL;
        }
 
@@ -923,7 +924,8 @@ static int __maybe_unused rt712_sdca_dmic_dev_resume(struct device *dev)
        time = wait_for_completion_timeout(&slave->initialization_complete,
                                msecs_to_jiffies(RT712_PROBE_TIMEOUT));
        if (!time) {
-               dev_err(&slave->dev, "Initialization not complete, timed out\n");
+               dev_err(&slave->dev, "%s: Initialization not complete, timed out\n",
+                       __func__);
                sdw_show_ping_status(slave->bus, true);
 
                return -ETIMEDOUT;
index 01ac555cd79b84a0b1aabe57899b1fedc214a2b5..4e9ab3ef135b34946d37d6280a9afb568cceef51 100644 (file)
@@ -438,20 +438,21 @@ static int __maybe_unused rt712_sdca_dev_resume(struct device *dev)
                return 0;
 
        if (!slave->unattach_request) {
+               mutex_lock(&rt712->disable_irq_lock);
                if (rt712->disable_irq == true) {
-                       mutex_lock(&rt712->disable_irq_lock);
+
                        sdw_write_no_pm(slave, SDW_SCP_SDCA_INTMASK1, SDW_SCP_SDCA_INTMASK_SDCA_0);
                        sdw_write_no_pm(slave, SDW_SCP_SDCA_INTMASK2, SDW_SCP_SDCA_INTMASK_SDCA_8);
                        rt712->disable_irq = false;
-                       mutex_unlock(&rt712->disable_irq_lock);
                }
+               mutex_unlock(&rt712->disable_irq_lock);
                goto regmap_sync;
        }
 
        time = wait_for_completion_timeout(&slave->initialization_complete,
                                msecs_to_jiffies(RT712_PROBE_TIMEOUT));
        if (!time) {
-               dev_err(&slave->dev, "Initialization not complete, timed out\n");
+               dev_err(&slave->dev, "%s: Initialization not complete, timed out\n", __func__);
                sdw_show_ping_status(slave->bus, true);
 
                return -ETIMEDOUT;
index 6954fbe7ec5f3bb79f8693c23f302a7a1003e11e..b503de9fda80e71cbe78e8916a6a7f41286ac5b2 100644 (file)
@@ -34,8 +34,8 @@ static int rt712_sdca_index_write(struct rt712_sdca_priv *rt712,
        ret = regmap_write(regmap, addr, value);
        if (ret < 0)
                dev_err(&rt712->slave->dev,
-                       "Failed to set private value: %06x <= %04x ret=%d\n",
-                       addr, value, ret);
+                       "%s: Failed to set private value: %06x <= %04x ret=%d\n",
+                       __func__, addr, value, ret);
 
        return ret;
 }
@@ -50,8 +50,8 @@ static int rt712_sdca_index_read(struct rt712_sdca_priv *rt712,
        ret = regmap_read(regmap, addr, value);
        if (ret < 0)
                dev_err(&rt712->slave->dev,
-                       "Failed to get private value: %06x => %04x ret=%d\n",
-                       addr, *value, ret);
+                       "%s: Failed to get private value: %06x => %04x ret=%d\n",
+                       __func__, addr, *value, ret);
 
        return ret;
 }
@@ -1060,13 +1060,13 @@ static int rt712_sdca_pcm_hw_params(struct snd_pcm_substream *substream,
        retval = sdw_stream_add_slave(rt712->slave, &stream_config,
                                        &port_config, 1, sdw_stream);
        if (retval) {
-               dev_err(dai->dev, "Unable to configure port\n");
+               dev_err(dai->dev, "%s: Unable to configure port\n", __func__);
                return retval;
        }
 
        if (params_channels(params) > 16) {
-               dev_err(component->dev, "Unsupported channels %d\n",
-                       params_channels(params));
+               dev_err(component->dev, "%s: Unsupported channels %d\n",
+                       __func__, params_channels(params));
                return -EINVAL;
        }
 
@@ -1085,8 +1085,8 @@ static int rt712_sdca_pcm_hw_params(struct snd_pcm_substream *substream,
                sampling_rate = RT712_SDCA_RATE_192000HZ;
                break;
        default:
-               dev_err(component->dev, "Rate %d is not supported\n",
-                       params_rate(params));
+               dev_err(component->dev, "%s: Rate %d is not supported\n",
+                       __func__, params_rate(params));
                return -EINVAL;
        }
 
@@ -1106,7 +1106,7 @@ static int rt712_sdca_pcm_hw_params(struct snd_pcm_substream *substream,
                        sampling_rate);
                break;
        default:
-               dev_err(component->dev, "Wrong DAI id\n");
+               dev_err(component->dev, "%s: Wrong DAI id\n", __func__);
                return -EINVAL;
        }
 
index ab54a67a27ebbfc8fbe19837fa07ed7d084bd429..ee450126106f969588ab52b83434309f8cfb8036 100644 (file)
@@ -237,7 +237,7 @@ static int __maybe_unused rt715_dev_resume(struct device *dev)
        time = wait_for_completion_timeout(&slave->enumeration_complete,
                                           msecs_to_jiffies(RT715_PROBE_TIMEOUT));
        if (!time) {
-               dev_err(&slave->dev, "Enumeration not complete, timed out\n");
+               dev_err(&slave->dev, "%s: Enumeration not complete, timed out\n", __func__);
                sdw_show_ping_status(slave->bus, true);
 
                return -ETIMEDOUT;
index 4533eedd7e189f3b48e36175eb5494f20a6f1be0..3fb7b9adb61de628705d784fbe64e259bf031089 100644 (file)
@@ -41,8 +41,8 @@ static int rt715_sdca_index_write(struct rt715_sdca_priv *rt715,
        ret = regmap_write(regmap, addr, value);
        if (ret < 0)
                dev_err(&rt715->slave->dev,
-                       "Failed to set private value: %08x <= %04x %d\n",
-                       addr, value, ret);
+                       "%s: Failed to set private value: %08x <= %04x %d\n",
+                       __func__, addr, value, ret);
 
        return ret;
 }
@@ -59,8 +59,8 @@ static int rt715_sdca_index_read(struct rt715_sdca_priv *rt715,
        ret = regmap_read(regmap, addr, value);
        if (ret < 0)
                dev_err(&rt715->slave->dev,
-                               "Failed to get private value: %06x => %04x ret=%d\n",
-                               addr, *value, ret);
+                       "%s: Failed to get private value: %06x => %04x ret=%d\n",
+                       __func__, addr, *value, ret);
 
        return ret;
 }
@@ -152,8 +152,8 @@ static int rt715_sdca_set_amp_gain_put(struct snd_kcontrol *kcontrol,
                                mc->shift);
                ret = regmap_write(rt715->mbq_regmap, mc->reg + i, gain_val);
                if (ret != 0) {
-                       dev_err(component->dev, "Failed to write 0x%x=0x%x\n",
-                               mc->reg + i, gain_val);
+                       dev_err(component->dev, "%s: Failed to write 0x%x=0x%x\n",
+                               __func__, mc->reg + i, gain_val);
                        return ret;
                }
        }
@@ -188,8 +188,8 @@ static int rt715_sdca_set_amp_gain_4ch_put(struct snd_kcontrol *kcontrol,
                ret = regmap_write(rt715->mbq_regmap, reg_base + i,
                                gain_val);
                if (ret != 0) {
-                       dev_err(component->dev, "Failed to write 0x%x=0x%x\n",
-                               reg_base + i, gain_val);
+                       dev_err(component->dev, "%s: Failed to write 0x%x=0x%x\n",
+                               __func__, reg_base + i, gain_val);
                        return ret;
                }
        }
@@ -224,8 +224,8 @@ static int rt715_sdca_set_amp_gain_8ch_put(struct snd_kcontrol *kcontrol,
                reg = i < 7 ? reg_base + i : (reg_base - 1) | BIT(15);
                ret = regmap_write(rt715->mbq_regmap, reg, gain_val);
                if (ret != 0) {
-                       dev_err(component->dev, "Failed to write 0x%x=0x%x\n",
-                               reg, gain_val);
+                       dev_err(component->dev, "%s: Failed to write 0x%x=0x%x\n",
+                               __func__, reg, gain_val);
                        return ret;
                }
        }
@@ -246,8 +246,8 @@ static int rt715_sdca_set_amp_gain_get(struct snd_kcontrol *kcontrol,
        for (i = 0; i < 2; i++) {
                ret = regmap_read(rt715->mbq_regmap, mc->reg + i, &val);
                if (ret < 0) {
-                       dev_err(component->dev, "Failed to read 0x%x, ret=%d\n",
-                               mc->reg + i, ret);
+                       dev_err(component->dev, "%s: Failed to read 0x%x, ret=%d\n",
+                               __func__, mc->reg + i, ret);
                        return ret;
                }
                ucontrol->value.integer.value[i] = rt715_sdca_get_gain(val, mc->shift);
@@ -271,8 +271,8 @@ static int rt715_sdca_set_amp_gain_4ch_get(struct snd_kcontrol *kcontrol,
        for (i = 0; i < 4; i++) {
                ret = regmap_read(rt715->mbq_regmap, reg_base + i, &val);
                if (ret < 0) {
-                       dev_err(component->dev, "Failed to read 0x%x, ret=%d\n",
-                               reg_base + i, ret);
+                       dev_err(component->dev, "%s: Failed to read 0x%x, ret=%d\n",
+                               __func__, reg_base + i, ret);
                        return ret;
                }
                ucontrol->value.integer.value[i] = rt715_sdca_get_gain(val, gain_sft);
@@ -297,8 +297,8 @@ static int rt715_sdca_set_amp_gain_8ch_get(struct snd_kcontrol *kcontrol,
        for (i = 0; i < 8; i += 2) {
                ret = regmap_read(rt715->mbq_regmap, reg_base + i, &val_l);
                if (ret < 0) {
-                       dev_err(component->dev, "Failed to read 0x%x, ret=%d\n",
-                                       reg_base + i, ret);
+                       dev_err(component->dev, "%s: Failed to read 0x%x, ret=%d\n",
+                               __func__, reg_base + i, ret);
                        return ret;
                }
                ucontrol->value.integer.value[i] = (val_l >> gain_sft) / 10;
@@ -306,8 +306,8 @@ static int rt715_sdca_set_amp_gain_8ch_get(struct snd_kcontrol *kcontrol,
                reg = (i == 6) ? (reg_base - 1) | BIT(15) : reg_base + 1 + i;
                ret = regmap_read(rt715->mbq_regmap, reg, &val_r);
                if (ret < 0) {
-                       dev_err(component->dev, "Failed to read 0x%x, ret=%d\n",
-                                       reg, ret);
+                       dev_err(component->dev, "%s: Failed to read 0x%x, ret=%d\n",
+                               __func__, reg, ret);
                        return ret;
                }
                ucontrol->value.integer.value[i + 1] = (val_r >> gain_sft) / 10;
@@ -834,15 +834,15 @@ static int rt715_sdca_pcm_hw_params(struct snd_pcm_substream *substream,
                        0xaf00);
                break;
        default:
-               dev_err(component->dev, "Invalid DAI id %d\n", dai->id);
+               dev_err(component->dev, "%s: Invalid DAI id %d\n", __func__, dai->id);
                return -EINVAL;
        }
 
        retval = sdw_stream_add_slave(rt715->slave, &stream_config,
                                        &port_config, 1, sdw_stream);
        if (retval) {
-               dev_err(component->dev, "Unable to configure port, retval:%d\n",
-                       retval);
+               dev_err(component->dev, "%s: Unable to configure port, retval:%d\n",
+                       __func__, retval);
                return retval;
        }
 
@@ -893,8 +893,8 @@ static int rt715_sdca_pcm_hw_params(struct snd_pcm_substream *substream,
                val = 0xf;
                break;
        default:
-               dev_err(component->dev, "Unsupported sample rate %d\n",
-                       params_rate(params));
+               dev_err(component->dev, "%s: Unsupported sample rate %d\n",
+                       __func__, params_rate(params));
                return -EINVAL;
        }
 
index 21f37babd148a487e82568144a791bff07fdf6c0..7e13868ff99f03110c165dcd706cff46a8eeba5d 100644 (file)
@@ -482,7 +482,7 @@ static int rt715_bus_config(struct sdw_slave *slave,
 
        ret = rt715_clock_config(&slave->dev);
        if (ret < 0)
-               dev_err(&slave->dev, "Invalid clk config");
+               dev_err(&slave->dev, "%s: Invalid clk config", __func__);
 
        return 0;
 }
@@ -554,7 +554,7 @@ static int __maybe_unused rt715_dev_resume(struct device *dev)
        time = wait_for_completion_timeout(&slave->initialization_complete,
                                           msecs_to_jiffies(RT715_PROBE_TIMEOUT));
        if (!time) {
-               dev_err(&slave->dev, "Initialization not complete, timed out\n");
+               dev_err(&slave->dev, "%s: Initialization not complete, timed out\n", __func__);
                sdw_show_ping_status(slave->bus, true);
 
                return -ETIMEDOUT;
index 9f732a5abd53f37cd24382522f9dc3ab97ecd7b0..299c9b12377c6ada95a40b4a876a43dd127786be 100644 (file)
@@ -40,8 +40,8 @@ static int rt715_index_write(struct regmap *regmap, unsigned int reg,
 
        ret = regmap_write(regmap, addr, value);
        if (ret < 0) {
-               pr_err("Failed to set private value: %08x <= %04x %d\n",
-                      addr, value, ret);
+               pr_err("%s: Failed to set private value: %08x <= %04x %d\n",
+                      __func__, addr, value, ret);
        }
 
        return ret;
@@ -55,8 +55,8 @@ static int rt715_index_write_nid(struct regmap *regmap,
 
        ret = regmap_write(regmap, addr, value);
        if (ret < 0)
-               pr_err("Failed to set private value: %06x <= %04x ret=%d\n",
-                       addr, value, ret);
+               pr_err("%s: Failed to set private value: %06x <= %04x ret=%d\n",
+                      __func__, addr, value, ret);
 
        return ret;
 }
@@ -70,8 +70,8 @@ static int rt715_index_read_nid(struct regmap *regmap,
        *value = 0;
        ret = regmap_read(regmap, addr, value);
        if (ret < 0)
-               pr_err("Failed to get private value: %06x => %04x ret=%d\n",
-                       addr, *value, ret);
+               pr_err("%s: Failed to get private value: %06x => %04x ret=%d\n",
+                      __func__, addr, *value, ret);
 
        return ret;
 }
@@ -862,14 +862,14 @@ static int rt715_pcm_hw_params(struct snd_pcm_substream *substream,
                rt715_index_write(rt715->regmap, RT715_SDW_INPUT_SEL, 0xa000);
                break;
        default:
-               dev_err(component->dev, "Invalid DAI id %d\n", dai->id);
+               dev_err(component->dev, "%s: Invalid DAI id %d\n", __func__, dai->id);
                return -EINVAL;
        }
 
        retval = sdw_stream_add_slave(rt715->slave, &stream_config,
                                        &port_config, 1, sdw_stream);
        if (retval) {
-               dev_err(dai->dev, "Unable to configure port\n");
+               dev_err(dai->dev, "%s: Unable to configure port\n", __func__);
                return retval;
        }
 
@@ -883,8 +883,8 @@ static int rt715_pcm_hw_params(struct snd_pcm_substream *substream,
                val |= 0x0 << 8;
                break;
        default:
-               dev_err(component->dev, "Unsupported sample rate %d\n",
-                       params_rate(params));
+               dev_err(component->dev, "%s: Unsupported sample rate %d\n",
+                       __func__, params_rate(params));
                return -EINVAL;
        }
 
@@ -892,8 +892,8 @@ static int rt715_pcm_hw_params(struct snd_pcm_substream *substream,
                /* bit 3:0 Number of Channel */
                val |= (params_channels(params) - 1);
        } else {
-               dev_err(component->dev, "Unsupported channels %d\n",
-                       params_channels(params));
+               dev_err(component->dev, "%s: Unsupported channels %d\n",
+                       __func__, params_channels(params));
                return -EINVAL;
        }
 
index eb76f4c675b67fd59df00cb41a17955c751bbd44..65d584c1886e819597577ed10551d2fe5d104e53 100644 (file)
@@ -467,13 +467,13 @@ static int __maybe_unused rt722_sdca_dev_resume(struct device *dev)
                return 0;
 
        if (!slave->unattach_request) {
+               mutex_lock(&rt722->disable_irq_lock);
                if (rt722->disable_irq == true) {
-                       mutex_lock(&rt722->disable_irq_lock);
                        sdw_write_no_pm(slave, SDW_SCP_SDCA_INTMASK1, SDW_SCP_SDCA_INTMASK_SDCA_6);
                        sdw_write_no_pm(slave, SDW_SCP_SDCA_INTMASK2, SDW_SCP_SDCA_INTMASK_SDCA_8);
                        rt722->disable_irq = false;
-                       mutex_unlock(&rt722->disable_irq_lock);
                }
+               mutex_unlock(&rt722->disable_irq_lock);
                goto regmap_sync;
        }
 
index 0e1c65a20392addb92a6bdbc39319884f4d2f9c9..e0ea3a23f7cc6844691338ff8daae7f2843d2c6e 100644 (file)
@@ -35,8 +35,8 @@ int rt722_sdca_index_write(struct rt722_sdca_priv *rt722,
        ret = regmap_write(regmap, addr, value);
        if (ret < 0)
                dev_err(&rt722->slave->dev,
-                       "Failed to set private value: %06x <= %04x ret=%d\n",
-                       addr, value, ret);
+                       "%s: Failed to set private value: %06x <= %04x ret=%d\n",
+                       __func__, addr, value, ret);
 
        return ret;
 }
@@ -51,8 +51,8 @@ int rt722_sdca_index_read(struct rt722_sdca_priv *rt722,
        ret = regmap_read(regmap, addr, value);
        if (ret < 0)
                dev_err(&rt722->slave->dev,
-                       "Failed to get private value: %06x => %04x ret=%d\n",
-                       addr, *value, ret);
+                       "%s: Failed to get private value: %06x => %04x ret=%d\n",
+                       __func__, addr, *value, ret);
 
        return ret;
 }
@@ -663,7 +663,8 @@ static int rt722_sdca_dmic_set_gain_put(struct snd_kcontrol *kcontrol,
        for (i = 0; i < p->count; i++) {
                err = regmap_write(rt722->mbq_regmap, p->reg_base + i, gain_val[i]);
                if (err < 0)
-                       dev_err(&rt722->slave->dev, "%#08x can't be set\n", p->reg_base + i);
+                       dev_err(&rt722->slave->dev, "%s: %#08x can't be set\n",
+                               __func__, p->reg_base + i);
        }
 
        return changed;
@@ -1211,13 +1212,13 @@ static int rt722_sdca_pcm_hw_params(struct snd_pcm_substream *substream,
        retval = sdw_stream_add_slave(rt722->slave, &stream_config,
                                        &port_config, 1, sdw_stream);
        if (retval) {
-               dev_err(dai->dev, "Unable to configure port\n");
+               dev_err(dai->dev, "%s: Unable to configure port\n", __func__);
                return retval;
        }
 
        if (params_channels(params) > 16) {
-               dev_err(component->dev, "Unsupported channels %d\n",
-                       params_channels(params));
+               dev_err(component->dev, "%s: Unsupported channels %d\n",
+                       __func__, params_channels(params));
                return -EINVAL;
        }
 
@@ -1236,8 +1237,8 @@ static int rt722_sdca_pcm_hw_params(struct snd_pcm_substream *substream,
                sampling_rate = RT722_SDCA_RATE_192000HZ;
                break;
        default:
-               dev_err(component->dev, "Rate %d is not supported\n",
-                       params_rate(params));
+               dev_err(component->dev, "%s: Rate %d is not supported\n",
+                       __func__, params_rate(params));
                return -EINVAL;
        }
 
index e451c009f2d99980bab20dd5d4c55cc26bd73cd5..7d5c096e06cd32b77fc6b73f18002af63bd6c8d5 100644 (file)
@@ -683,11 +683,12 @@ static void wm_adsp_control_remove(struct cs_dsp_coeff_ctl *cs_ctl)
 int wm_adsp_write_ctl(struct wm_adsp *dsp, const char *name, int type,
                      unsigned int alg, void *buf, size_t len)
 {
-       struct cs_dsp_coeff_ctl *cs_ctl = cs_dsp_get_ctl(&dsp->cs_dsp, name, type, alg);
+       struct cs_dsp_coeff_ctl *cs_ctl;
        struct wm_coeff_ctl *ctl;
        int ret;
 
        mutex_lock(&dsp->cs_dsp.pwr_lock);
+       cs_ctl = cs_dsp_get_ctl(&dsp->cs_dsp, name, type, alg);
        ret = cs_dsp_coeff_write_ctrl(cs_ctl, 0, buf, len);
        mutex_unlock(&dsp->cs_dsp.pwr_lock);
 
index c018f84fe02529322455035e6ca4fff7ddf2afaf..fc072dc58968cb80d6481cb0c84a4b776bdef150 100644 (file)
@@ -296,5 +296,6 @@ static struct platform_driver avs_da7219_driver = {
 
 module_platform_driver(avs_da7219_driver);
 
+MODULE_DESCRIPTION("Intel da7219 machine driver");
 MODULE_AUTHOR("Cezary Rojewski <cezary.rojewski@intel.com>");
 MODULE_LICENSE("GPL");
index ba2bc7f689eb603051870bfcbf28dc5640b7ac66..d9e5e85f523358d26a85218c5050fe8b31876e21 100644 (file)
@@ -96,4 +96,5 @@ static struct platform_driver avs_dmic_driver = {
 
 module_platform_driver(avs_dmic_driver);
 
+MODULE_DESCRIPTION("Intel DMIC machine driver");
 MODULE_LICENSE("GPL");
index 1090082e7d5bfcd47e92ecbd6bed22269fab3678..5c90a60075773409431f957f05f2e3bc03303334 100644 (file)
@@ -326,4 +326,5 @@ static struct platform_driver avs_es8336_driver = {
 
 module_platform_driver(avs_es8336_driver);
 
+MODULE_DESCRIPTION("Intel es8336 machine driver");
 MODULE_LICENSE("GPL");
index 28f254eb0d03fcfa6f5fc8c4bd0184d73f9c298d..027373d6a16d602c62b07d34b2b0bd984c14ae78 100644 (file)
@@ -204,4 +204,5 @@ static struct platform_driver avs_i2s_test_driver = {
 
 module_platform_driver(avs_i2s_test_driver);
 
+MODULE_DESCRIPTION("Intel i2s test machine driver");
 MODULE_LICENSE("GPL");
index a83b95f25129f90e1a0fd6f5d34e7e6fa799d34f..1ff85e4d8e160b7c61a74fc6dbdf9a32ee410614 100644 (file)
@@ -154,4 +154,5 @@ static struct platform_driver avs_max98357a_driver = {
 
 module_platform_driver(avs_max98357a_driver)
 
+MODULE_DESCRIPTION("Intel max98357a machine driver");
 MODULE_LICENSE("GPL");
index 3b980a025e6f697446419f6efec4f071e45495cb..8d31586b73eaec7c10edb319002f1b024e41480b 100644 (file)
@@ -211,4 +211,5 @@ static struct platform_driver avs_max98373_driver = {
 
 module_platform_driver(avs_max98373_driver)
 
+MODULE_DESCRIPTION("Intel max98373 machine driver");
 MODULE_LICENSE("GPL");
index 86dd2b228df3a5ce1a2751221659834cb458c775..572ec58073d06bce6e07bc7dbd05f2b9a3cf5462 100644 (file)
@@ -208,4 +208,5 @@ static struct platform_driver avs_max98927_driver = {
 
 module_platform_driver(avs_max98927_driver)
 
+MODULE_DESCRIPTION("Intel max98927 machine driver");
 MODULE_LICENSE("GPL");
index 1c1e2083f474df122259c41f774e246dd7223a1f..55db75efae41425684bdd1c67441e76cae4d4062 100644 (file)
@@ -313,4 +313,5 @@ static struct platform_driver avs_nau8825_driver = {
 
 module_platform_driver(avs_nau8825_driver)
 
+MODULE_DESCRIPTION("Intel nau8825 machine driver");
 MODULE_LICENSE("GPL");
index a9469b5ecb402f1af389c52fd6e1e5d9991cc3e9..8be6887bbc6e81cb0f6af16685524fd01b96e36a 100644 (file)
@@ -69,4 +69,5 @@ static struct platform_driver avs_probe_mb_driver = {
 
 module_platform_driver(avs_probe_mb_driver);
 
+MODULE_DESCRIPTION("Intel probe machine driver");
 MODULE_LICENSE("GPL");
index bfcb8845fd15d06ec39d7360008ac60f73491d3b..1cf52421608753e1cca23d333eab4a7a9d624b63 100644 (file)
@@ -276,4 +276,5 @@ static struct platform_driver avs_rt274_driver = {
 
 module_platform_driver(avs_rt274_driver);
 
+MODULE_DESCRIPTION("Intel rt274 machine driver");
 MODULE_LICENSE("GPL");
index 28d7d86b1cc99dabed8c76a94ee2c5dc3064582e..4740bba1057032128c60594b9339b820f9f7bc70 100644 (file)
@@ -247,4 +247,5 @@ static struct platform_driver avs_rt286_driver = {
 
 module_platform_driver(avs_rt286_driver);
 
+MODULE_DESCRIPTION("Intel rt286 machine driver");
 MODULE_LICENSE("GPL");
index 80f490b9e11842c34859ff8b6cebc8b7cee51e71..6e409e29f6974654a0a5cbe4eba105f78c055164 100644 (file)
@@ -266,4 +266,5 @@ static struct platform_driver avs_rt298_driver = {
 
 module_platform_driver(avs_rt298_driver);
 
+MODULE_DESCRIPTION("Intel rt298 machine driver");
 MODULE_LICENSE("GPL");
index 60105f453ae235c6affdcfef2181b95b101a3e1c..097ae5f73241efea14cf187e85bc88d936be1956 100644 (file)
@@ -192,4 +192,5 @@ static struct platform_driver avs_rt5514_driver = {
 
 module_platform_driver(avs_rt5514_driver);
 
+MODULE_DESCRIPTION("Intel rt5514 machine driver");
 MODULE_LICENSE("GPL");
index b4762c2a7bf2d1a3b0237479145380243861c4d8..1880c315cc4d1f9be4b7f42113382b322bd9c4cd 100644 (file)
@@ -265,4 +265,5 @@ static struct platform_driver avs_rt5663_driver = {
 
 module_platform_driver(avs_rt5663_driver);
 
+MODULE_DESCRIPTION("Intel rt5663 machine driver");
 MODULE_LICENSE("GPL");
index 243f979fda98a4d6e67b76cef8b5a192dd6a8ff7..594a971ded9eb2ea339ab2e45ae84da8b4b1dd6d 100644 (file)
@@ -341,5 +341,6 @@ static struct platform_driver avs_rt5682_driver = {
 
 module_platform_driver(avs_rt5682_driver)
 
+MODULE_DESCRIPTION("Intel rt5682 machine driver");
 MODULE_AUTHOR("Cezary Rojewski <cezary.rojewski@intel.com>");
 MODULE_LICENSE("GPL");
index 4a0e136835ff5d05118b1d802d2884beabb68a95..d6f7f046c24e5d12bd3189fe800bb05b18ee4444 100644 (file)
@@ -200,4 +200,5 @@ static struct platform_driver avs_ssm4567_driver = {
 
 module_platform_driver(avs_ssm4567_driver)
 
+MODULE_DESCRIPTION("Intel ssm4567 machine driver");
 MODULE_LICENSE("GPL");
index 2d25748ca70662bf771c6896297ccb6a0fb0798f..b27e89ff6a1673f57db6e253a818d6fbe3d1ab91 100644 (file)
@@ -263,7 +263,7 @@ int snd_soc_get_volsw(struct snd_kcontrol *kcontrol,
        int max = mc->max;
        int min = mc->min;
        int sign_bit = mc->sign_bit;
-       unsigned int mask = (1 << fls(max)) - 1;
+       unsigned int mask = (1ULL << fls(max)) - 1;
        unsigned int invert = mc->invert;
        int val;
        int ret;
index be7dc1e02284ab62f8cbaeffdd70f26a19ff6232..c12c7f820529476de0273474082b8174ab0ae052 100644 (file)
@@ -704,6 +704,10 @@ int amd_sof_acp_probe(struct snd_sof_dev *sdev)
                goto unregister_dev;
        }
 
+       ret = acp_init(sdev);
+       if (ret < 0)
+               goto free_smn_dev;
+
        sdev->ipc_irq = pci->irq;
        ret = request_threaded_irq(sdev->ipc_irq, acp_irq_handler, acp_irq_thread,
                                   IRQF_SHARED, "AudioDSP", sdev);
@@ -713,10 +717,6 @@ int amd_sof_acp_probe(struct snd_sof_dev *sdev)
                goto free_smn_dev;
        }
 
-       ret = acp_init(sdev);
-       if (ret < 0)
-               goto free_ipc_irq;
-
        /* scan SoundWire capabilities exposed by DSDT */
        ret = acp_sof_scan_sdw_devices(sdev, chip->sdw_acpi_dev_addr);
        if (ret < 0) {
index 9b00ede2a486a2ff2d619ab714ed2c1665eb3463..cc84d4c81be9d363d701b1d5c658e26a62079435 100644 (file)
@@ -339,8 +339,7 @@ static int sof_init_environment(struct snd_sof_dev *sdev)
        ret = snd_sof_probe(sdev);
        if (ret < 0) {
                dev_err(sdev->dev, "failed to probe DSP %d\n", ret);
-               sof_ops_free(sdev);
-               return ret;
+               goto err_sof_probe;
        }
 
        /* check machine info */
@@ -358,15 +357,18 @@ static int sof_init_environment(struct snd_sof_dev *sdev)
                ret = validate_sof_ops(sdev);
                if (ret < 0) {
                        snd_sof_remove(sdev);
+                       snd_sof_remove_late(sdev);
                        return ret;
                }
        }
 
+       return 0;
+
 err_machine_check:
-       if (ret) {
-               snd_sof_remove(sdev);
-               sof_ops_free(sdev);
-       }
+       snd_sof_remove(sdev);
+err_sof_probe:
+       snd_sof_remove_late(sdev);
+       sof_ops_free(sdev);
 
        return ret;
 }
index 2b385cddc385c5bd59e11acfe8e6bda45704fdd1..d71bb66b9991164cdb8b0ed000e461d9e3a0719c 100644 (file)
@@ -57,6 +57,9 @@ struct snd_sof_dsp_ops sof_hda_common_ops = {
        .pcm_pointer    = hda_dsp_pcm_pointer,
        .pcm_ack        = hda_dsp_pcm_ack,
 
+       .get_dai_frame_counter = hda_dsp_get_stream_llp,
+       .get_host_byte_counter = hda_dsp_get_stream_ldp,
+
        /* firmware loading */
        .load_firmware = snd_sof_load_firmware_raw,
 
index c50ca9e72d37385816ddb3cd6ef7456ed50a58e9..b073720b4cf432466e18bf8840dd87eb5efac98e 100644 (file)
@@ -7,6 +7,7 @@
 
 #include <sound/pcm_params.h>
 #include <sound/hdaudio_ext.h>
+#include <sound/hda_register.h>
 #include <sound/hda-mlink.h>
 #include <sound/sof/ipc4/header.h>
 #include <uapi/sound/sof/header.h>
@@ -362,6 +363,16 @@ static int hda_trigger(struct snd_sof_dev *sdev, struct snd_soc_dai *cpu_dai,
        case SNDRV_PCM_TRIGGER_STOP:
        case SNDRV_PCM_TRIGGER_PAUSE_PUSH:
                snd_hdac_ext_stream_clear(hext_stream);
+
+               /*
+                * Save the LLP registers in case the stream is
+                * restarting due PAUSE_RELEASE, or START without a pcm
+                * close/open since in this case the LLP register is not reset
+                * to 0 and the delay calculation will return with invalid
+                * results.
+                */
+               hext_stream->pplcllpl = readl(hext_stream->pplc_addr + AZX_REG_PPLCLLPL);
+               hext_stream->pplcllpu = readl(hext_stream->pplc_addr + AZX_REG_PPLCLLPU);
                break;
        default:
                dev_err(sdev->dev, "unknown trigger command %d\n", cmd);
index 31ffa1a8f2ac04ddd5c31aadec5400c52757dd19..ef5c915db8ffb47a622a8c753f14dd950fb9c45c 100644 (file)
@@ -681,17 +681,27 @@ static int hda_suspend(struct snd_sof_dev *sdev, bool runtime_suspend)
        struct sof_intel_hda_dev *hda = sdev->pdata->hw_pdata;
        const struct sof_intel_dsp_desc *chip = hda->desc;
        struct hdac_bus *bus = sof_to_bus(sdev);
+       bool imr_lost = false;
        int ret, j;
 
        /*
-        * The memory used for IMR boot loses its content in deeper than S3 state
-        * We must not try IMR boot on next power up (as it will fail).
-        *
+        * The memory used for IMR boot loses its content in deeper than S3
+        * state on CAVS platforms.
+        * On ACE platforms due to the system architecture the IMR content is
+        * lost at S3 state already, they are tailored for s2idle use.
+        * We must not try IMR boot on next power up in these cases as it will
+        * fail.
+        */
+       if (sdev->system_suspend_target > SOF_SUSPEND_S3 ||
+           (chip->hw_ip_version >= SOF_INTEL_ACE_1_0 &&
+            sdev->system_suspend_target == SOF_SUSPEND_S3))
+               imr_lost = true;
+
+       /*
         * In case of firmware crash or boot failure set the skip_imr_boot to true
         * as well in order to try to re-load the firmware to do a 'cold' boot.
         */
-       if (sdev->system_suspend_target > SOF_SUSPEND_S3 ||
-           sdev->fw_state == SOF_FW_CRASHED ||
+       if (imr_lost || sdev->fw_state == SOF_FW_CRASHED ||
            sdev->fw_state == SOF_FW_BOOT_FAILED)
                hda->skip_imr_boot = true;
 
index 18f07364d2198425bffd3e111a546e11b536cd63..d7b446f3f973e3d532d8eaef241aac2f3a30a54d 100644 (file)
@@ -259,8 +259,37 @@ int hda_dsp_pcm_open(struct snd_sof_dev *sdev,
                snd_pcm_hw_constraint_mask64(substream->runtime, SNDRV_PCM_HW_PARAM_FORMAT,
                                             SNDRV_PCM_FMTBIT_S16 | SNDRV_PCM_FMTBIT_S32);
 
+       /*
+        * The dsp_max_burst_size_in_ms is the length of the maximum burst size
+        * of the host DMA in the ALSA buffer.
+        *
+        * On playback start the DMA will transfer dsp_max_burst_size_in_ms
+        * amount of data in one initial burst to fill up the host DMA buffer.
+        * Consequent DMA burst sizes are shorter and their length can vary.
+        * To make sure that userspace allocate large enough ALSA buffer we need
+        * to place a constraint on the buffer time.
+        *
+        * On capture the DMA will transfer 1ms chunks.
+        *
+        * Exact dsp_max_burst_size_in_ms constraint is racy, so set the
+        * constraint to a minimum of 2x dsp_max_burst_size_in_ms.
+        */
+       if (spcm->stream[direction].dsp_max_burst_size_in_ms)
+               snd_pcm_hw_constraint_minmax(substream->runtime,
+                       SNDRV_PCM_HW_PARAM_BUFFER_TIME,
+                       spcm->stream[direction].dsp_max_burst_size_in_ms * USEC_PER_MSEC * 2,
+                       UINT_MAX);
+
        /* binding pcm substream to hda stream */
        substream->runtime->private_data = &dsp_stream->hstream;
+
+       /*
+        * Reset the llp cache values (they are used for LLP compensation in
+        * case the counter is not reset)
+        */
+       dsp_stream->pplcllpl = 0;
+       dsp_stream->pplcllpu = 0;
+
        return 0;
 }
 
index b387b1a69d7ea3ceaed9fe814b174d9040e3eae1..0c189d3b19c1af6448d5d1264802ef493e5c7b14 100644 (file)
@@ -1063,3 +1063,73 @@ snd_pcm_uframes_t hda_dsp_stream_get_position(struct hdac_stream *hstream,
 
        return pos;
 }
+
+#define merge_u64(u32_u, u32_l) (((u64)(u32_u) << 32) | (u32_l))
+
+/**
+ * hda_dsp_get_stream_llp - Retrieve the LLP (Linear Link Position) of the stream
+ * @sdev: SOF device
+ * @component: ASoC component
+ * @substream: PCM substream
+ *
+ * Returns the raw Linear Link Position value
+ */
+u64 hda_dsp_get_stream_llp(struct snd_sof_dev *sdev,
+                          struct snd_soc_component *component,
+                          struct snd_pcm_substream *substream)
+{
+       struct hdac_stream *hstream = substream->runtime->private_data;
+       struct hdac_ext_stream *hext_stream = stream_to_hdac_ext_stream(hstream);
+       u32 llp_l, llp_u;
+
+       /*
+        * The pplc_addr have been calculated during probe in
+        * hda_dsp_stream_init():
+        * pplc_addr = sdev->bar[HDA_DSP_PP_BAR] +
+        *             SOF_HDA_PPLC_BASE +
+        *             SOF_HDA_PPLC_MULTI * total_stream +
+        *             SOF_HDA_PPLC_INTERVAL * stream_index
+        *
+        * Use this pre-calculated address to avoid repeated re-calculation.
+        */
+       llp_l = readl(hext_stream->pplc_addr + AZX_REG_PPLCLLPL);
+       llp_u = readl(hext_stream->pplc_addr + AZX_REG_PPLCLLPU);
+
+       /* Compensate the LLP counter with the saved offset */
+       if (hext_stream->pplcllpl || hext_stream->pplcllpu)
+               return merge_u64(llp_u, llp_l) -
+                      merge_u64(hext_stream->pplcllpu, hext_stream->pplcllpl);
+
+       return merge_u64(llp_u, llp_l);
+}
+
+/**
+ * hda_dsp_get_stream_ldp - Retrieve the LDP (Linear DMA Position) of the stream
+ * @sdev: SOF device
+ * @component: ASoC component
+ * @substream: PCM substream
+ *
+ * Returns the raw Linear Link Position value
+ */
+u64 hda_dsp_get_stream_ldp(struct snd_sof_dev *sdev,
+                          struct snd_soc_component *component,
+                          struct snd_pcm_substream *substream)
+{
+       struct hdac_stream *hstream = substream->runtime->private_data;
+       struct hdac_ext_stream *hext_stream = stream_to_hdac_ext_stream(hstream);
+       u32 ldp_l, ldp_u;
+
+       /*
+        * The pphc_addr have been calculated during probe in
+        * hda_dsp_stream_init():
+        * pphc_addr = sdev->bar[HDA_DSP_PP_BAR] +
+        *             SOF_HDA_PPHC_BASE +
+        *             SOF_HDA_PPHC_INTERVAL * stream_index
+        *
+        * Use this pre-calculated address to avoid repeated re-calculation.
+        */
+       ldp_l = readl(hext_stream->pphc_addr + AZX_REG_PPHCLDPL);
+       ldp_u = readl(hext_stream->pphc_addr + AZX_REG_PPHCLDPU);
+
+       return ((u64)ldp_u << 32) | ldp_l;
+}
index b36eb7c7891335a3038d5e1402d6f73ede754b81..81a1d4606d3cde8ecb1b9b2ef859c7b0393555f5 100644 (file)
@@ -662,6 +662,12 @@ bool hda_dsp_check_stream_irq(struct snd_sof_dev *sdev);
 
 snd_pcm_uframes_t hda_dsp_stream_get_position(struct hdac_stream *hstream,
                                              int direction, bool can_sleep);
+u64 hda_dsp_get_stream_llp(struct snd_sof_dev *sdev,
+                          struct snd_soc_component *component,
+                          struct snd_pcm_substream *substream);
+u64 hda_dsp_get_stream_ldp(struct snd_sof_dev *sdev,
+                          struct snd_soc_component *component,
+                          struct snd_pcm_substream *substream);
 
 struct hdac_ext_stream *
        hda_dsp_stream_get(struct snd_sof_dev *sdev, int direction, u32 flags);
index 7ae017a00184e52371052c188d75f13fbfc053df..aeb4350cce6bba3b229af876e188c4ead7f2b201 100644 (file)
@@ -29,15 +29,17 @@ static const struct snd_sof_debugfs_map lnl_dsp_debugfs[] = {
 };
 
 /* this helps allows the DSP to setup DMIC/SSP */
-static int hdac_bus_offload_dmic_ssp(struct hdac_bus *bus)
+static int hdac_bus_offload_dmic_ssp(struct hdac_bus *bus, bool enable)
 {
        int ret;
 
-       ret = hdac_bus_eml_enable_offload(bus, true,  AZX_REG_ML_LEPTR_ID_INTEL_SSP, true);
+       ret = hdac_bus_eml_enable_offload(bus, true,
+                                         AZX_REG_ML_LEPTR_ID_INTEL_SSP, enable);
        if (ret < 0)
                return ret;
 
-       ret = hdac_bus_eml_enable_offload(bus, true,  AZX_REG_ML_LEPTR_ID_INTEL_DMIC, true);
+       ret = hdac_bus_eml_enable_offload(bus, true,
+                                         AZX_REG_ML_LEPTR_ID_INTEL_DMIC, enable);
        if (ret < 0)
                return ret;
 
@@ -52,7 +54,19 @@ static int lnl_hda_dsp_probe(struct snd_sof_dev *sdev)
        if (ret < 0)
                return ret;
 
-       return hdac_bus_offload_dmic_ssp(sof_to_bus(sdev));
+       return hdac_bus_offload_dmic_ssp(sof_to_bus(sdev), true);
+}
+
+static void lnl_hda_dsp_remove(struct snd_sof_dev *sdev)
+{
+       int ret;
+
+       ret = hdac_bus_offload_dmic_ssp(sof_to_bus(sdev), false);
+       if (ret < 0)
+               dev_warn(sdev->dev,
+                        "Failed to disable offload for DMIC/SSP: %d\n", ret);
+
+       hda_dsp_remove(sdev);
 }
 
 static int lnl_hda_dsp_resume(struct snd_sof_dev *sdev)
@@ -63,7 +77,7 @@ static int lnl_hda_dsp_resume(struct snd_sof_dev *sdev)
        if (ret < 0)
                return ret;
 
-       return hdac_bus_offload_dmic_ssp(sof_to_bus(sdev));
+       return hdac_bus_offload_dmic_ssp(sof_to_bus(sdev), true);
 }
 
 static int lnl_hda_dsp_runtime_resume(struct snd_sof_dev *sdev)
@@ -74,7 +88,7 @@ static int lnl_hda_dsp_runtime_resume(struct snd_sof_dev *sdev)
        if (ret < 0)
                return ret;
 
-       return hdac_bus_offload_dmic_ssp(sof_to_bus(sdev));
+       return hdac_bus_offload_dmic_ssp(sof_to_bus(sdev), true);
 }
 
 static int lnl_dsp_post_fw_run(struct snd_sof_dev *sdev)
@@ -97,9 +111,11 @@ int sof_lnl_ops_init(struct snd_sof_dev *sdev)
        /* common defaults */
        memcpy(&sof_lnl_ops, &sof_hda_common_ops, sizeof(struct snd_sof_dsp_ops));
 
-       /* probe */
-       if (!sdev->dspless_mode_selected)
+       /* probe/remove */
+       if (!sdev->dspless_mode_selected) {
                sof_lnl_ops.probe = lnl_hda_dsp_probe;
+               sof_lnl_ops.remove = lnl_hda_dsp_remove;
+       }
 
        /* shutdown */
        sof_lnl_ops.shutdown = hda_dsp_shutdown;
@@ -134,8 +150,6 @@ int sof_lnl_ops_init(struct snd_sof_dev *sdev)
                sof_lnl_ops.runtime_resume = lnl_hda_dsp_runtime_resume;
        }
 
-       sof_lnl_ops.get_stream_position = mtl_dsp_get_stream_hda_link_position;
-
        /* dsp core get/put */
        sof_lnl_ops.core_get = mtl_dsp_core_get;
        sof_lnl_ops.core_put = mtl_dsp_core_put;
index df05dc77b8d5e3bef5f3e55ea7e82837b5a89504..060c34988e90d122caf12cc30fe42ba5f1d0c87d 100644 (file)
@@ -626,18 +626,6 @@ static int mtl_dsp_disable_interrupts(struct snd_sof_dev *sdev)
        return mtl_enable_interrupts(sdev, false);
 }
 
-u64 mtl_dsp_get_stream_hda_link_position(struct snd_sof_dev *sdev,
-                                        struct snd_soc_component *component,
-                                        struct snd_pcm_substream *substream)
-{
-       struct hdac_stream *hstream = substream->runtime->private_data;
-       u32 llp_l, llp_u;
-
-       llp_l = snd_sof_dsp_read(sdev, HDA_DSP_HDA_BAR, MTL_PPLCLLPL(hstream->index));
-       llp_u = snd_sof_dsp_read(sdev, HDA_DSP_HDA_BAR, MTL_PPLCLLPU(hstream->index));
-       return ((u64)llp_u << 32) | llp_l;
-}
-
 int mtl_dsp_core_get(struct snd_sof_dev *sdev, int core)
 {
        const struct sof_ipc_pm_ops *pm_ops = sdev->ipc->ops->pm;
@@ -707,8 +695,6 @@ int sof_mtl_ops_init(struct snd_sof_dev *sdev)
        sof_mtl_ops.core_get = mtl_dsp_core_get;
        sof_mtl_ops.core_put = mtl_dsp_core_put;
 
-       sof_mtl_ops.get_stream_position = mtl_dsp_get_stream_hda_link_position;
-
        sdev->private = kzalloc(sizeof(struct sof_ipc4_fw_data), GFP_KERNEL);
        if (!sdev->private)
                return -ENOMEM;
index cc5a1f46fd09560e9fefc10d6b4775b82294bfd4..ea8c1b83f7127d58f76bd5db018eeb0f0d9b1d7f 100644 (file)
@@ -6,12 +6,6 @@
  * Copyright(c) 2020-2022 Intel Corporation. All rights reserved.
  */
 
-/* HDA Registers */
-#define MTL_PPLCLLPL_BASE              0x948
-#define MTL_PPLCLLPU_STRIDE            0x10
-#define MTL_PPLCLLPL(x)                        (MTL_PPLCLLPL_BASE + (x) * MTL_PPLCLLPU_STRIDE)
-#define MTL_PPLCLLPU(x)                        (MTL_PPLCLLPL_BASE + 0x4 + (x) * MTL_PPLCLLPU_STRIDE)
-
 /* DSP Registers */
 #define MTL_HFDSSCS                    0x1000
 #define MTL_HFDSSCS_SPA_MASK           BIT(16)
@@ -103,9 +97,5 @@ int mtl_dsp_ipc_get_window_offset(struct snd_sof_dev *sdev, u32 id);
 
 void mtl_ipc_dump(struct snd_sof_dev *sdev);
 
-u64 mtl_dsp_get_stream_hda_link_position(struct snd_sof_dev *sdev,
-                                        struct snd_soc_component *component,
-                                        struct snd_pcm_substream *substream);
-
 int mtl_dsp_core_get(struct snd_sof_dev *sdev, int core);
 int mtl_dsp_core_put(struct snd_sof_dev *sdev, int core);
index 9f1e33ee8826123cdc57bd0da78b876e79bf6f27..0e04bea9432ddab2e60b2f61d209689d560b39fb 100644 (file)
@@ -4,6 +4,7 @@
 
 #include <linux/debugfs.h>
 #include <linux/sched/signal.h>
+#include <linux/sched/clock.h>
 #include <sound/sof/ipc4/header.h>
 #include "sof-priv.h"
 #include "ipc4-priv.h"
@@ -412,7 +413,6 @@ static int ipc4_mtrace_enable(struct snd_sof_dev *sdev)
        const struct sof_ipc_ops *iops = sdev->ipc->ops;
        struct sof_ipc4_msg msg;
        u64 system_time;
-       ktime_t kt;
        int ret;
 
        if (priv->mtrace_state != SOF_MTRACE_DISABLED)
@@ -424,9 +424,12 @@ static int ipc4_mtrace_enable(struct snd_sof_dev *sdev)
        msg.primary |= SOF_IPC4_MOD_INSTANCE(SOF_IPC4_MOD_INIT_BASEFW_INSTANCE_ID);
        msg.extension = SOF_IPC4_MOD_EXT_MSG_PARAM_ID(SOF_IPC4_FW_PARAM_SYSTEM_TIME);
 
-       /* The system time is in usec, UTC, epoch is 1601-01-01 00:00:00 */
-       kt = ktime_add_us(ktime_get_real(), FW_EPOCH_DELTA * USEC_PER_SEC);
-       system_time = ktime_to_us(kt);
+       /*
+        * local_clock() is used to align with dmesg, so both kernel and firmware logs have
+        * the same base and a minor delta due to the IPC. system time is in us format but
+        * local_clock() returns the time in ns, so convert to ns.
+        */
+       system_time = div64_u64(local_clock(), NSEC_PER_USEC);
        msg.data_size = sizeof(system_time);
        msg.data_ptr = &system_time;
        ret = iops->set_get_data(sdev, &msg, msg.data_size, true);
index 0f332c8cdbe6afe6fc9449b48194ebb749db5d3d..e915f9f87a6c35d74f1cf7096accca70dce688da 100644 (file)
 #include "ipc4-topology.h"
 #include "ipc4-fw-reg.h"
 
+/**
+ * struct sof_ipc4_timestamp_info - IPC4 timestamp info
+ * @host_copier: the host copier of the pcm stream
+ * @dai_copier: the dai copier of the pcm stream
+ * @stream_start_offset: reported by fw in memory window (converted to frames)
+ * @stream_end_offset: reported by fw in memory window (converted to frames)
+ * @llp_offset: llp offset in memory window
+ * @boundary: wrap boundary should be used for the LLP frame counter
+ * @delay: Calculated and stored in pointer callback. The stored value is
+ *        returned in the delay callback.
+ */
+struct sof_ipc4_timestamp_info {
+       struct sof_ipc4_copier *host_copier;
+       struct sof_ipc4_copier *dai_copier;
+       u64 stream_start_offset;
+       u64 stream_end_offset;
+       u32 llp_offset;
+
+       u64 boundary;
+       snd_pcm_sframes_t delay;
+};
+
 static int sof_ipc4_set_multi_pipeline_state(struct snd_sof_dev *sdev, u32 state,
                                             struct ipc4_pipeline_set_state_data *trigger_list)
 {
@@ -423,8 +445,19 @@ static int sof_ipc4_trigger_pipelines(struct snd_soc_component *component,
        }
 
        /* return if this is the final state */
-       if (state == SOF_IPC4_PIPE_PAUSED)
+       if (state == SOF_IPC4_PIPE_PAUSED) {
+               struct sof_ipc4_timestamp_info *time_info;
+
+               /*
+                * Invalidate the stream_start_offset to make sure that it is
+                * going to be updated if the stream resumes
+                */
+               time_info = spcm->stream[substream->stream].private;
+               if (time_info)
+                       time_info->stream_start_offset = SOF_IPC4_INVALID_STREAM_POSITION;
+
                goto free;
+       }
 skip_pause_transition:
        /* else set the RUNNING/RESET state in the DSP */
        ret = sof_ipc4_set_multi_pipeline_state(sdev, state, trigger_list);
@@ -464,14 +497,12 @@ static int sof_ipc4_pcm_trigger(struct snd_soc_component *component,
 
        /* determine the pipeline state */
        switch (cmd) {
-       case SNDRV_PCM_TRIGGER_PAUSE_PUSH:
-               state = SOF_IPC4_PIPE_PAUSED;
-               break;
        case SNDRV_PCM_TRIGGER_PAUSE_RELEASE:
        case SNDRV_PCM_TRIGGER_RESUME:
        case SNDRV_PCM_TRIGGER_START:
                state = SOF_IPC4_PIPE_RUNNING;
                break;
+       case SNDRV_PCM_TRIGGER_PAUSE_PUSH:
        case SNDRV_PCM_TRIGGER_SUSPEND:
        case SNDRV_PCM_TRIGGER_STOP:
                state = SOF_IPC4_PIPE_PAUSED;
@@ -703,6 +734,10 @@ static int sof_ipc4_pcm_setup(struct snd_sof_dev *sdev, struct snd_sof_pcm *spcm
        if (abi_version < SOF_IPC4_FW_REGS_ABI_VER)
                support_info = false;
 
+       /* For delay reporting the get_host_byte_counter callback is needed */
+       if (!sof_ops(sdev) || !sof_ops(sdev)->get_host_byte_counter)
+               support_info = false;
+
        for_each_pcm_streams(stream) {
                pipeline_list = &spcm->stream[stream].pipeline_list;
 
@@ -835,7 +870,6 @@ static int sof_ipc4_get_stream_start_offset(struct snd_sof_dev *sdev,
        struct sof_ipc4_copier *host_copier = time_info->host_copier;
        struct sof_ipc4_copier *dai_copier = time_info->dai_copier;
        struct sof_ipc4_pipeline_registers ppl_reg;
-       u64 stream_start_position;
        u32 dai_sample_size;
        u32 ch, node_index;
        u32 offset;
@@ -852,38 +886,51 @@ static int sof_ipc4_get_stream_start_offset(struct snd_sof_dev *sdev,
        if (ppl_reg.stream_start_offset == SOF_IPC4_INVALID_STREAM_POSITION)
                return -EINVAL;
 
-       stream_start_position = ppl_reg.stream_start_offset;
        ch = dai_copier->data.out_format.fmt_cfg;
        ch = SOF_IPC4_AUDIO_FORMAT_CFG_CHANNELS_COUNT(ch);
        dai_sample_size = (dai_copier->data.out_format.bit_depth >> 3) * ch;
-       /* convert offset to sample count */
-       do_div(stream_start_position, dai_sample_size);
-       time_info->stream_start_offset = stream_start_position;
+
+       /* convert offsets to frame count */
+       time_info->stream_start_offset = ppl_reg.stream_start_offset;
+       do_div(time_info->stream_start_offset, dai_sample_size);
+       time_info->stream_end_offset = ppl_reg.stream_end_offset;
+       do_div(time_info->stream_end_offset, dai_sample_size);
+
+       /*
+        * Calculate the wrap boundary need to be used for delay calculation
+        * The host counter is in bytes, it will wrap earlier than the frames
+        * based link counter.
+        */
+       time_info->boundary = div64_u64(~((u64)0),
+                                       frames_to_bytes(substream->runtime, 1));
+       /* Initialize the delay value to 0 (no delay) */
+       time_info->delay = 0;
 
        return 0;
 }
 
-static snd_pcm_sframes_t sof_ipc4_pcm_delay(struct snd_soc_component *component,
-                                           struct snd_pcm_substream *substream)
+static int sof_ipc4_pcm_pointer(struct snd_soc_component *component,
+                               struct snd_pcm_substream *substream,
+                               snd_pcm_uframes_t *pointer)
 {
        struct snd_sof_dev *sdev = snd_soc_component_get_drvdata(component);
        struct snd_soc_pcm_runtime *rtd = snd_soc_substream_to_rtd(substream);
        struct sof_ipc4_timestamp_info *time_info;
        struct sof_ipc4_llp_reading_slot llp;
-       snd_pcm_uframes_t head_ptr, tail_ptr;
+       snd_pcm_uframes_t head_cnt, tail_cnt;
        struct snd_sof_pcm_stream *stream;
+       u64 dai_cnt, host_cnt, host_ptr;
        struct snd_sof_pcm *spcm;
-       u64 tmp_ptr;
        int ret;
 
        spcm = snd_sof_find_spcm_dai(component, rtd);
        if (!spcm)
-               return 0;
+               return -EOPNOTSUPP;
 
        stream = &spcm->stream[substream->stream];
        time_info = stream->private;
        if (!time_info)
-               return 0;
+               return -EOPNOTSUPP;
 
        /*
         * stream_start_offset is updated to memory window by FW based on
@@ -893,45 +940,116 @@ static snd_pcm_sframes_t sof_ipc4_pcm_delay(struct snd_soc_component *component,
        if (time_info->stream_start_offset == SOF_IPC4_INVALID_STREAM_POSITION) {
                ret = sof_ipc4_get_stream_start_offset(sdev, substream, stream, time_info);
                if (ret < 0)
-                       return 0;
+                       return -EOPNOTSUPP;
        }
 
+       /* For delay calculation we need the host counter */
+       host_cnt = snd_sof_pcm_get_host_byte_counter(sdev, component, substream);
+       host_ptr = host_cnt;
+
+       /* convert the host_cnt to frames */
+       host_cnt = div64_u64(host_cnt, frames_to_bytes(substream->runtime, 1));
+
        /*
-        * HDaudio links don't support the LLP counter reported by firmware
-        * the link position is read directly from hardware registers.
+        * If the LLP counter is not reported by firmware in the SRAM window
+        * then read the dai (link) counter via host accessible means if
+        * available.
         */
        if (!time_info->llp_offset) {
-               tmp_ptr = snd_sof_pcm_get_stream_position(sdev, component, substream);
-               if (!tmp_ptr)
-                       return 0;
+               dai_cnt = snd_sof_pcm_get_dai_frame_counter(sdev, component, substream);
+               if (!dai_cnt)
+                       return -EOPNOTSUPP;
        } else {
                sof_mailbox_read(sdev, time_info->llp_offset, &llp, sizeof(llp));
-               tmp_ptr = ((u64)llp.reading.llp_u << 32) | llp.reading.llp_l;
+               dai_cnt = ((u64)llp.reading.llp_u << 32) | llp.reading.llp_l;
        }
+       dai_cnt += time_info->stream_end_offset;
 
-       /* In two cases dai dma position is not accurate
+       /* In two cases dai dma counter is not accurate
         * (1) dai pipeline is started before host pipeline
-        * (2) multiple streams mixed into one. Each stream has the same dai dma position
+        * (2) multiple streams mixed into one. Each stream has the same dai dma
+        *     counter
         *
-        * Firmware calculates correct stream_start_offset for all cases including above two.
-        * Driver subtracts stream_start_offset from dai dma position to get accurate one
+        * Firmware calculates correct stream_start_offset for all cases
+        * including above two.
+        * Driver subtracts stream_start_offset from dai dma counter to get
+        * accurate one
         */
-       tmp_ptr -= time_info->stream_start_offset;
 
-       /* Calculate the delay taking into account that both pointer can wrap */
-       div64_u64_rem(tmp_ptr, substream->runtime->boundary, &tmp_ptr);
+       /*
+        * On stream start the dai counter might not yet have reached the
+        * stream_start_offset value which means that no frames have left the
+        * DSP yet from the audio stream (on playback, capture streams have
+        * offset of 0 as we start capturing right away).
+        * In this case we need to adjust the distance between the counters by
+        * increasing the host counter by (offset - dai_counter).
+        * Otherwise the dai_counter needs to be adjusted to reflect the number
+        * of valid frames passed on the DAI side.
+        *
+        * The delay is the difference between the counters on the two
+        * sides of the DSP.
+        */
+       if (dai_cnt < time_info->stream_start_offset) {
+               host_cnt += time_info->stream_start_offset - dai_cnt;
+               dai_cnt = 0;
+       } else {
+               dai_cnt -= time_info->stream_start_offset;
+       }
+
+       /* Wrap the dai counter at the boundary where the host counter wraps */
+       div64_u64_rem(dai_cnt, time_info->boundary, &dai_cnt);
+
        if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK) {
-               head_ptr = substream->runtime->status->hw_ptr;
-               tail_ptr = tmp_ptr;
+               head_cnt = host_cnt;
+               tail_cnt = dai_cnt;
        } else {
-               head_ptr = tmp_ptr;
-               tail_ptr = substream->runtime->status->hw_ptr;
+               head_cnt = dai_cnt;
+               tail_cnt = host_cnt;
+       }
+
+       if (head_cnt < tail_cnt) {
+               time_info->delay = time_info->boundary - tail_cnt + head_cnt;
+               goto out;
        }
 
-       if (head_ptr < tail_ptr)
-               return substream->runtime->boundary - tail_ptr + head_ptr;
+       time_info->delay =  head_cnt - tail_cnt;
+
+out:
+       /*
+        * Convert the host byte counter to PCM pointer which wraps in buffer
+        * and it is in frames
+        */
+       div64_u64_rem(host_ptr, snd_pcm_lib_buffer_bytes(substream), &host_ptr);
+       *pointer = bytes_to_frames(substream->runtime, host_ptr);
+
+       return 0;
+}
+
+static snd_pcm_sframes_t sof_ipc4_pcm_delay(struct snd_soc_component *component,
+                                           struct snd_pcm_substream *substream)
+{
+       struct snd_soc_pcm_runtime *rtd = snd_soc_substream_to_rtd(substream);
+       struct sof_ipc4_timestamp_info *time_info;
+       struct snd_sof_pcm_stream *stream;
+       struct snd_sof_pcm *spcm;
+
+       spcm = snd_sof_find_spcm_dai(component, rtd);
+       if (!spcm)
+               return 0;
+
+       stream = &spcm->stream[substream->stream];
+       time_info = stream->private;
+       /*
+        * Report the stored delay value calculated in the pointer callback.
+        * In the unlikely event that the calculation was skipped/aborted, the
+        * default 0 delay returned.
+        */
+       if (time_info)
+               return time_info->delay;
+
+       /* No delay information available, report 0 as delay */
+       return 0;
 
-       return head_ptr - tail_ptr;
 }
 
 const struct sof_ipc_pcm_ops ipc4_pcm_ops = {
@@ -941,6 +1059,7 @@ const struct sof_ipc_pcm_ops ipc4_pcm_ops = {
        .dai_link_fixup = sof_ipc4_pcm_dai_link_fixup,
        .pcm_setup = sof_ipc4_pcm_setup,
        .pcm_free = sof_ipc4_pcm_free,
+       .pointer = sof_ipc4_pcm_pointer,
        .delay = sof_ipc4_pcm_delay,
        .ipc_first_on_start = true,
        .platform_stop_during_hw_free = true,
index f3b908b093f9562ddeb6932b4f1743a27c2b3c09..afed618a15f061a8588466490ee38ea19a80bc3d 100644 (file)
@@ -92,20 +92,6 @@ struct sof_ipc4_fw_data {
        struct mutex pipeline_state_mutex; /* protect pipeline triggers, ref counts and states */
 };
 
-/**
- * struct sof_ipc4_timestamp_info - IPC4 timestamp info
- * @host_copier: the host copier of the pcm stream
- * @dai_copier: the dai copier of the pcm stream
- * @stream_start_offset: reported by fw in memory window
- * @llp_offset: llp offset in memory window
- */
-struct sof_ipc4_timestamp_info {
-       struct sof_ipc4_copier *host_copier;
-       struct sof_ipc4_copier *dai_copier;
-       u64 stream_start_offset;
-       u32 llp_offset;
-};
-
 extern const struct sof_ipc_fw_loader_ops ipc4_loader_ops;
 extern const struct sof_ipc_tplg_ops ipc4_tplg_ops;
 extern const struct sof_ipc_tplg_control_ops tplg_ipc4_control_ops;
index f28edd9830c1b3e25961add70dff87a489cfa119..5cca058421260978dd18e992b09dfff58b44bbdb 100644 (file)
@@ -412,8 +412,9 @@ static int sof_ipc4_widget_setup_pcm(struct snd_sof_widget *swidget)
        struct sof_ipc4_available_audio_format *available_fmt;
        struct snd_soc_component *scomp = swidget->scomp;
        struct sof_ipc4_copier *ipc4_copier;
+       struct snd_sof_pcm *spcm;
        int node_type = 0;
-       int ret;
+       int ret, dir;
 
        ipc4_copier = kzalloc(sizeof(*ipc4_copier), GFP_KERNEL);
        if (!ipc4_copier)
@@ -447,6 +448,25 @@ static int sof_ipc4_widget_setup_pcm(struct snd_sof_widget *swidget)
        }
        dev_dbg(scomp->dev, "host copier '%s' node_type %u\n", swidget->widget->name, node_type);
 
+       spcm = snd_sof_find_spcm_comp(scomp, swidget->comp_id, &dir);
+       if (!spcm)
+               goto skip_gtw_cfg;
+
+       if (dir == SNDRV_PCM_STREAM_PLAYBACK) {
+               struct snd_sof_pcm_stream *sps = &spcm->stream[dir];
+
+               sof_update_ipc_object(scomp, &sps->dsp_max_burst_size_in_ms,
+                                     SOF_COPIER_DEEP_BUFFER_TOKENS,
+                                     swidget->tuples,
+                                     swidget->num_tuples, sizeof(u32), 1);
+               /* Set default DMA buffer size if it is not specified in topology */
+               if (!sps->dsp_max_burst_size_in_ms)
+                       sps->dsp_max_burst_size_in_ms = SOF_IPC4_MIN_DMA_BUFFER_SIZE;
+       } else {
+               /* Capture data is copied from DSP to host in 1ms bursts */
+               spcm->stream[dir].dsp_max_burst_size_in_ms = 1;
+       }
+
 skip_gtw_cfg:
        ipc4_copier->gtw_attr = kzalloc(sizeof(*ipc4_copier->gtw_attr), GFP_KERNEL);
        if (!ipc4_copier->gtw_attr) {
index 6cf21e829e07272ccf4c7005f74f9ae61403d39b..3cd748e13460916517d9533c48d2172d556fc344 100644 (file)
@@ -523,12 +523,26 @@ static inline int snd_sof_pcm_platform_ack(struct snd_sof_dev *sdev,
        return 0;
 }
 
-static inline u64 snd_sof_pcm_get_stream_position(struct snd_sof_dev *sdev,
-                                                 struct snd_soc_component *component,
-                                                 struct snd_pcm_substream *substream)
+static inline u64
+snd_sof_pcm_get_dai_frame_counter(struct snd_sof_dev *sdev,
+                                 struct snd_soc_component *component,
+                                 struct snd_pcm_substream *substream)
 {
-       if (sof_ops(sdev) && sof_ops(sdev)->get_stream_position)
-               return sof_ops(sdev)->get_stream_position(sdev, component, substream);
+       if (sof_ops(sdev) && sof_ops(sdev)->get_dai_frame_counter)
+               return sof_ops(sdev)->get_dai_frame_counter(sdev, component,
+                                                           substream);
+
+       return 0;
+}
+
+static inline u64
+snd_sof_pcm_get_host_byte_counter(struct snd_sof_dev *sdev,
+                                 struct snd_soc_component *component,
+                                 struct snd_pcm_substream *substream)
+{
+       if (sof_ops(sdev) && sof_ops(sdev)->get_host_byte_counter)
+               return sof_ops(sdev)->get_host_byte_counter(sdev, component,
+                                                           substream);
 
        return 0;
 }
index 33d576b1764783ab3468591703e0c97106893e9e..f03cee94bce62642e3c419d4f956a2011ea4dd3f 100644 (file)
@@ -388,13 +388,21 @@ static snd_pcm_uframes_t sof_pcm_pointer(struct snd_soc_component *component,
 {
        struct snd_soc_pcm_runtime *rtd = snd_soc_substream_to_rtd(substream);
        struct snd_sof_dev *sdev = snd_soc_component_get_drvdata(component);
+       const struct sof_ipc_pcm_ops *pcm_ops = sof_ipc_get_ops(sdev, pcm);
        struct snd_sof_pcm *spcm;
        snd_pcm_uframes_t host, dai;
+       int ret = -EOPNOTSUPP;
 
        /* nothing to do for BE */
        if (rtd->dai_link->no_pcm)
                return 0;
 
+       if (pcm_ops && pcm_ops->pointer)
+               ret = pcm_ops->pointer(component, substream, &host);
+
+       if (ret != -EOPNOTSUPP)
+               return ret ? ret : host;
+
        /* use dsp ops pointer callback directly if set */
        if (sof_ops(sdev)->pcm_pointer)
                return sof_ops(sdev)->pcm_pointer(sdev, substream);
index 9ea2ac5adac79ee322f82060b908ce529cd9c43b..86bbb531e142c72be1ca5d710c466d16c9058734 100644 (file)
@@ -103,7 +103,10 @@ struct snd_sof_dai_config_data {
  *            additional memory in the SOF PCM stream structure
  * @pcm_free: Function pointer for PCM free that can be used for freeing any
  *            additional memory in the SOF PCM stream structure
- * @delay: Function pointer for pcm delay calculation
+ * @pointer: Function pointer for pcm pointer
+ *          Note: the @pointer callback may return -EOPNOTSUPP which should be
+ *                handled in a same way as if the callback is not provided
+ * @delay: Function pointer for pcm delay reporting
  * @reset_hw_params_during_stop: Flag indicating whether the hw_params should be reset during the
  *                              STOP pcm trigger
  * @ipc_first_on_start: Send IPC before invoking platform trigger during
@@ -124,6 +127,9 @@ struct sof_ipc_pcm_ops {
        int (*dai_link_fixup)(struct snd_soc_pcm_runtime *rtd, struct snd_pcm_hw_params *params);
        int (*pcm_setup)(struct snd_sof_dev *sdev, struct snd_sof_pcm *spcm);
        void (*pcm_free)(struct snd_sof_dev *sdev, struct snd_sof_pcm *spcm);
+       int (*pointer)(struct snd_soc_component *component,
+                      struct snd_pcm_substream *substream,
+                      snd_pcm_uframes_t *pointer);
        snd_pcm_sframes_t (*delay)(struct snd_soc_component *component,
                                   struct snd_pcm_substream *substream);
        bool reset_hw_params_during_stop;
@@ -322,6 +328,7 @@ struct snd_sof_pcm_stream {
        struct work_struct period_elapsed_work;
        struct snd_soc_dapm_widget_list *list; /* list of connected DAPM widgets */
        bool d0i3_compatible; /* DSP can be in D0I3 when this pcm is opened */
+       unsigned int dsp_max_burst_size_in_ms; /* The maximum size of the host DMA burst in ms */
        /*
         * flag to indicate that the DSP pipelines should be kept
         * active or not while suspending the stream
index d453a4ce3b219d601813310c22cbf11029a08a77..d3c436f826046bca9f385b429d6d7e1639600f63 100644 (file)
@@ -262,13 +262,25 @@ struct snd_sof_dsp_ops {
        int (*pcm_ack)(struct snd_sof_dev *sdev, struct snd_pcm_substream *substream); /* optional */
 
        /*
-        * optional callback to retrieve the link DMA position for the substream
-        * when the position is not reported in the shared SRAM windows but
-        * instead from a host-accessible hardware counter.
+        * optional callback to retrieve the number of frames left/arrived from/to
+        * the DSP on the DAI side (link/codec/DMIC/etc).
+        *
+        * The callback is used when the firmware does not provide this information
+        * via the shared SRAM window and it can be retrieved by host.
         */
-       u64 (*get_stream_position)(struct snd_sof_dev *sdev,
-                                  struct snd_soc_component *component,
-                                  struct snd_pcm_substream *substream); /* optional */
+       u64 (*get_dai_frame_counter)(struct snd_sof_dev *sdev,
+                                    struct snd_soc_component *component,
+                                    struct snd_pcm_substream *substream); /* optional */
+
+       /*
+        * Optional callback to retrieve the number of bytes left/arrived from/to
+        * the DSP on the host side (bytes between host ALSA buffer and DSP).
+        *
+        * The callback is needed for ALSA delay reporting.
+        */
+       u64 (*get_host_byte_counter)(struct snd_sof_dev *sdev,
+                                    struct snd_soc_component *component,
+                                    struct snd_pcm_substream *substream); /* optional */
 
        /* host read DSP stream data */
        int (*ipc_msg_data)(struct snd_sof_dev *sdev,
index b67617b68e509d2c86d78058f7796a64aab00f41..f4437015d43a7500b809a303f175b211662d500f 100644 (file)
@@ -202,7 +202,7 @@ int line6_send_raw_message_async(struct usb_line6 *line6, const char *buffer,
        struct urb *urb;
 
        /* create message: */
-       msg = kmalloc(sizeof(struct message), GFP_ATOMIC);
+       msg = kzalloc(sizeof(struct message), GFP_ATOMIC);
        if (msg == NULL)
                return -ENOMEM;
 
@@ -688,7 +688,7 @@ static int line6_init_cap_control(struct usb_line6 *line6)
        int ret;
 
        /* initialize USB buffers: */
-       line6->buffer_listen = kmalloc(LINE6_BUFSIZE_LISTEN, GFP_KERNEL);
+       line6->buffer_listen = kzalloc(LINE6_BUFSIZE_LISTEN, GFP_KERNEL);
        if (!line6->buffer_listen)
                return -ENOMEM;
 
@@ -697,7 +697,7 @@ static int line6_init_cap_control(struct usb_line6 *line6)
                return -ENOMEM;
 
        if (line6->properties->capabilities & LINE6_CAP_CONTROL_MIDI) {
-               line6->buffer_message = kmalloc(LINE6_MIDI_MESSAGE_MAXLEN, GFP_KERNEL);
+               line6->buffer_message = kzalloc(LINE6_MIDI_MESSAGE_MAXLEN, GFP_KERNEL);
                if (!line6->buffer_message)
                        return -ENOMEM;
 
index 4b0673bf52c2e615017bf2b94da1f6fc4392e532..07cfad817d53908f2325505d2b9cb644a808a689 100644 (file)
@@ -8,6 +8,7 @@
 #include <linux/build_bug.h>
 #include <linux/compiler.h>
 #include <linux/math.h>
+#include <linux/panic.h>
 #include <endian.h>
 #include <byteswap.h>
 
index f3c82ab5b14cd77819030096b81e0b67cba0df1d..7d73da0980473fd3fdbdcd88e9e041077d5a2df3 100644 (file)
@@ -37,4 +37,9 @@ static inline void totalram_pages_add(long count)
 {
 }
 
+static inline int early_pfn_to_nid(unsigned long pfn)
+{
+       return 0;
+}
+
 #endif
diff --git a/tools/include/linux/panic.h b/tools/include/linux/panic.h
new file mode 100644 (file)
index 0000000..9c8f17a
--- /dev/null
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _TOOLS_LINUX_PANIC_H
+#define _TOOLS_LINUX_PANIC_H
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <stdlib.h>
+
+static inline void panic(const char *fmt, ...)
+{
+       va_list argp;
+
+       va_start(argp, fmt);
+       vfprintf(stderr, fmt, argp);
+       va_end(argp);
+       exit(-1);
+}
+
+#endif
index 8f08c3fd498d5b81185519728fc1c28a8a0d4d5f..0d3672e5d9ed1553a720f3b0b52ca71f81fbc2d9 100644 (file)
@@ -67,6 +67,10 @@ The column name "all" can be used to enable all disabled-by-default built-in cou
 .PP
 \fB--quiet\fP Do not decode and print the system configuration header information.
 .PP
++\fB--no-msr\fP Disable all the uses of the MSR driver.
++.PP
++\fB--no-perf\fP Disable all the uses of the perf API.
++.PP
 \fB--interval seconds\fP overrides the default 5.0 second measurement interval.
 .PP
 \fB--num_iterations num\fP number of the measurement iterations.
@@ -125,9 +129,17 @@ The system configuration dump (if --quiet is not used) is followed by statistics
 .PP
 \fBPkgTmp\fP Degrees Celsius reported by the per-package Package Thermal Monitor.
 .PP
-\fBGFX%rc6\fP The percentage of time the GPU is in the "render C6" state, rc6, during the measurement interval. From /sys/class/drm/card0/power/rc6_residency_ms.
+\fBGFX%rc6\fP The percentage of time the GPU is in the "render C6" state, rc6, during the measurement interval. From /sys/class/drm/card0/power/rc6_residency_ms or /sys/class/drm/card0/gt/gt0/rc6_residency_ms or /sys/class/drm/card0/device/tile0/gtN/gtidle/idle_residency_ms depending on the graphics driver being used.
 .PP
-\fBGFXMHz\fP Instantaneous snapshot of what sysfs presents at the end of the measurement interval. From /sys/class/graphics/fb0/device/drm/card0/gt_cur_freq_mhz.
+\fBGFXMHz\fP Instantaneous snapshot of what sysfs presents at the end of the measurement interval. From /sys/class/graphics/fb0/device/drm/card0/gt_cur_freq_mhz or /sys/class/drm/card0/gt_cur_freq_mhz or /sys/class/drm/card0/gt/gt0/rps_cur_freq_mhz or /sys/class/drm/card0/device/tile0/gtN/freq0/cur_freq depending on the graphics driver being used.
+.PP
+\fBGFXAMHz\fP Instantaneous snapshot of what sysfs presents at the end of the measurement interval. From /sys/class/graphics/fb0/device/drm/card0/gt_act_freq_mhz or /sys/class/drm/card0/gt_act_freq_mhz or /sys/class/drm/card0/gt/gt0/rps_act_freq_mhz or /sys/class/drm/card0/device/tile0/gtN/freq0/act_freq depending on the graphics driver being used.
+.PP
+\fBSAM%mc6\fP The percentage of time the SA Media is in the "module C6" state, mc6, during the measurement interval. From /sys/class/drm/card0/gt/gt1/rc6_residency_ms or /sys/class/drm/card0/device/tile0/gtN/gtidle/idle_residency_ms depending on the graphics driver being used.
+.PP
+\fBSAMMHz\fP Instantaneous snapshot of what sysfs presents at the end of the measurement interval. From /sys/class/drm/card0/gt/gt1/rps_cur_freq_mhz or /sys/class/drm/card0/device/tile0/gtN/freq0/cur_freq depending on the graphics driver being used.
+.PP
+\fBSAMAMHz\fP Instantaneous snapshot of what sysfs presents at the end of the measurement interval. From /sys/class/drm/card0/gt/gt1/rps_act_freq_mhz or /sys/class/drm/card0/device/tile0/gtN/freq0/act_freq depending on the graphics driver being used.
 .PP
 \fBPkg%pc2, Pkg%pc3, Pkg%pc6, Pkg%pc7\fP percentage residency in hardware package idle states.  These numbers are from hardware residency counters.
 .PP
@@ -370,7 +382,7 @@ below the processor's base frequency.
 
 Busy% = MPERF_delta/TSC_delta
 
-Bzy_MHz = TSC_delta/APERF_delta/MPERF_delta/measurement_interval
+Bzy_MHz = TSC_delta*APERF_delta/MPERF_delta/measurement_interval
 
 Note that these calculations depend on TSC_delta, so they
 are not reliable during intervals when TSC_MHz is not running at the base frequency.
index 7a334377f92b978fa642a0071b19f33d7e6fe74e..98256468e24806acfc0daee374d0cf9877e92131 100644 (file)
@@ -3,7 +3,7 @@
  * turbostat -- show CPU frequency and C-state residency
  * on modern Intel and AMD processors.
  *
- * Copyright (c) 2023 Intel Corporation.
+ * Copyright (c) 2024 Intel Corporation.
  * Len Brown <len.brown@intel.com>
  */
 
@@ -36,6 +36,8 @@
 #include <linux/perf_event.h>
 #include <asm/unistd.h>
 #include <stdbool.h>
+#include <assert.h>
+#include <linux/kernel.h>
 
 #define UNUSED(x) (void)(x)
 
 #define        NAME_BYTES 20
 #define PATH_BYTES 128
 
+#define MAX_NOFILE 0x8000
+
 enum counter_scope { SCOPE_CPU, SCOPE_CORE, SCOPE_PACKAGE };
 enum counter_type { COUNTER_ITEMS, COUNTER_CYCLES, COUNTER_SECONDS, COUNTER_USEC };
 enum counter_format { FORMAT_RAW, FORMAT_DELTA, FORMAT_PERCENT };
+enum amperf_source { AMPERF_SOURCE_PERF, AMPERF_SOURCE_MSR };
+enum rapl_source { RAPL_SOURCE_NONE, RAPL_SOURCE_PERF, RAPL_SOURCE_MSR };
 
 struct msr_counter {
        unsigned int msr_num;
@@ -127,6 +133,9 @@ struct msr_counter bic[] = {
        { 0x0, "IPC", "", 0, 0, 0, NULL, 0 },
        { 0x0, "CoreThr", "", 0, 0, 0, NULL, 0 },
        { 0x0, "UncMHz", "", 0, 0, 0, NULL, 0 },
+       { 0x0, "SAM%mc6", "", 0, 0, 0, NULL, 0 },
+       { 0x0, "SAMMHz", "", 0, 0, 0, NULL, 0 },
+       { 0x0, "SAMAMHz", "", 0, 0, 0, NULL, 0 },
 };
 
 #define MAX_BIC (sizeof(bic) / sizeof(struct msr_counter))
@@ -185,11 +194,14 @@ struct msr_counter bic[] = {
 #define        BIC_IPC         (1ULL << 52)
 #define        BIC_CORE_THROT_CNT      (1ULL << 53)
 #define        BIC_UNCORE_MHZ          (1ULL << 54)
+#define        BIC_SAM_mc6             (1ULL << 55)
+#define        BIC_SAMMHz              (1ULL << 56)
+#define        BIC_SAMACTMHz           (1ULL << 57)
 
 #define BIC_TOPOLOGY (BIC_Package | BIC_Node | BIC_CoreCnt | BIC_PkgCnt | BIC_Core | BIC_CPU | BIC_Die )
 #define BIC_THERMAL_PWR ( BIC_CoreTmp | BIC_PkgTmp | BIC_PkgWatt | BIC_CorWatt | BIC_GFXWatt | BIC_RAMWatt | BIC_PKG__ | BIC_RAM__)
-#define BIC_FREQUENCY ( BIC_Avg_MHz | BIC_Busy | BIC_Bzy_MHz | BIC_TSC_MHz | BIC_GFXMHz | BIC_GFXACTMHz | BIC_UNCORE_MHZ)
-#define BIC_IDLE ( BIC_sysfs | BIC_CPU_c1 | BIC_CPU_c3 | BIC_CPU_c6 | BIC_CPU_c7 | BIC_GFX_rc6 | BIC_Pkgpc2 | BIC_Pkgpc3 | BIC_Pkgpc6 | BIC_Pkgpc7 | BIC_Pkgpc8 | BIC_Pkgpc9 | BIC_Pkgpc10 | BIC_CPU_LPI | BIC_SYS_LPI | BIC_Mod_c6 | BIC_Totl_c0 | BIC_Any_c0 | BIC_GFX_c0 | BIC_CPUGFX)
+#define BIC_FREQUENCY (BIC_Avg_MHz | BIC_Busy | BIC_Bzy_MHz | BIC_TSC_MHz | BIC_GFXMHz | BIC_GFXACTMHz | BIC_SAMMHz | BIC_SAMACTMHz | BIC_UNCORE_MHZ)
+#define BIC_IDLE (BIC_sysfs | BIC_CPU_c1 | BIC_CPU_c3 | BIC_CPU_c6 | BIC_CPU_c7 | BIC_GFX_rc6 | BIC_Pkgpc2 | BIC_Pkgpc3 | BIC_Pkgpc6 | BIC_Pkgpc7 | BIC_Pkgpc8 | BIC_Pkgpc9 | BIC_Pkgpc10 | BIC_CPU_LPI | BIC_SYS_LPI | BIC_Mod_c6 | BIC_Totl_c0 | BIC_Any_c0 | BIC_GFX_c0 | BIC_CPUGFX | BIC_SAM_mc6)
 #define BIC_OTHER ( BIC_IRQ | BIC_SMI | BIC_ThreadC | BIC_CoreTmp | BIC_IPC)
 
 #define BIC_DISABLED_BY_DEFAULT        (BIC_USEC | BIC_TOD | BIC_APIC | BIC_X2APIC)
@@ -204,10 +216,13 @@ unsigned long long bic_present = BIC_USEC | BIC_TOD | BIC_sysfs | BIC_APIC | BIC
 #define BIC_NOT_PRESENT(COUNTER_BIT) (bic_present &= ~COUNTER_BIT)
 #define BIC_IS_ENABLED(COUNTER_BIT) (bic_enabled & COUNTER_BIT)
 
+struct amperf_group_fd;
+
 char *proc_stat = "/proc/stat";
 FILE *outf;
 int *fd_percpu;
 int *fd_instr_count_percpu;
+struct amperf_group_fd *fd_amperf_percpu;      /* File descriptors for perf group with APERF and MPERF counters. */
 struct timeval interval_tv = { 5, 0 };
 struct timespec interval_ts = { 5, 0 };
 
@@ -242,11 +257,8 @@ char *output_buffer, *outp;
 unsigned int do_dts;
 unsigned int do_ptm;
 unsigned int do_ipc;
-unsigned long long gfx_cur_rc6_ms;
 unsigned long long cpuidle_cur_cpu_lpi_us;
 unsigned long long cpuidle_cur_sys_lpi_us;
-unsigned int gfx_cur_mhz;
-unsigned int gfx_act_mhz;
 unsigned int tj_max;
 unsigned int tj_max_override;
 double rapl_power_units, rapl_time_units;
@@ -263,6 +275,28 @@ unsigned int has_hwp_epp;  /* IA32_HWP_REQUEST[bits 31:24] */
 unsigned int has_hwp_pkg;      /* IA32_HWP_REQUEST_PKG */
 unsigned int first_counter_read = 1;
 int ignore_stdin;
+bool no_msr;
+bool no_perf;
+enum amperf_source amperf_source;
+
+enum gfx_sysfs_idx {
+       GFX_rc6,
+       GFX_MHz,
+       GFX_ACTMHz,
+       SAM_mc6,
+       SAM_MHz,
+       SAM_ACTMHz,
+       GFX_MAX
+};
+
+struct gfx_sysfs_info {
+       const char *path;
+       FILE *fp;
+       unsigned int val;
+       unsigned long long val_ull;
+};
+
+static struct gfx_sysfs_info gfx_info[GFX_MAX];
 
 int get_msr(int cpu, off_t offset, unsigned long long *msr);
 
@@ -652,6 +686,7 @@ static const struct platform_features icx_features = {
        .bclk_freq = BCLK_100MHZ,
        .supported_cstates = CC1 | CC6 | PC2 | PC6,
        .cst_limit = CST_LIMIT_ICX,
+       .has_msr_core_c1_res = 1,
        .has_irtl_msrs = 1,
        .has_cst_prewake_bit = 1,
        .trl_msrs = TRL_BASE | TRL_CORECOUNT,
@@ -948,6 +983,175 @@ size_t cpu_present_setsize, cpu_effective_setsize, cpu_allowed_setsize, cpu_affi
 #define MAX_ADDED_THREAD_COUNTERS 24
 #define BITMASK_SIZE 32
 
+/* Indexes used to map data read from perf and MSRs into global variables */
+enum rapl_rci_index {
+       RAPL_RCI_INDEX_ENERGY_PKG = 0,
+       RAPL_RCI_INDEX_ENERGY_CORES = 1,
+       RAPL_RCI_INDEX_DRAM = 2,
+       RAPL_RCI_INDEX_GFX = 3,
+       RAPL_RCI_INDEX_PKG_PERF_STATUS = 4,
+       RAPL_RCI_INDEX_DRAM_PERF_STATUS = 5,
+       RAPL_RCI_INDEX_CORE_ENERGY = 6,
+       NUM_RAPL_COUNTERS,
+};
+
+enum rapl_unit {
+       RAPL_UNIT_INVALID,
+       RAPL_UNIT_JOULES,
+       RAPL_UNIT_WATTS,
+};
+
+struct rapl_counter_info_t {
+       unsigned long long data[NUM_RAPL_COUNTERS];
+       enum rapl_source source[NUM_RAPL_COUNTERS];
+       unsigned long long flags[NUM_RAPL_COUNTERS];
+       double scale[NUM_RAPL_COUNTERS];
+       enum rapl_unit unit[NUM_RAPL_COUNTERS];
+
+       union {
+               /* Active when source == RAPL_SOURCE_MSR */
+               struct {
+                       unsigned long long msr[NUM_RAPL_COUNTERS];
+                       unsigned long long msr_mask[NUM_RAPL_COUNTERS];
+                       int msr_shift[NUM_RAPL_COUNTERS];
+               };
+       };
+
+       int fd_perf;
+};
+
+/* struct rapl_counter_info_t for each RAPL domain */
+struct rapl_counter_info_t *rapl_counter_info_perdomain;
+
+#define RAPL_COUNTER_FLAG_USE_MSR_SUM (1u << 1)
+
+struct rapl_counter_arch_info {
+       int feature_mask;       /* Mask for testing if the counter is supported on host */
+       const char *perf_subsys;
+       const char *perf_name;
+       unsigned long long msr;
+       unsigned long long msr_mask;
+       int msr_shift;          /* Positive mean shift right, negative mean shift left */
+       double *platform_rapl_msr_scale;        /* Scale applied to values read by MSR (platform dependent, filled at runtime) */
+       unsigned int rci_index; /* Maps data from perf counters to global variables */
+       unsigned long long bic;
+       double compat_scale;    /* Some counters require constant scaling to be in the same range as other, similar ones */
+       unsigned long long flags;
+};
+
+static const struct rapl_counter_arch_info rapl_counter_arch_infos[] = {
+       {
+        .feature_mask = RAPL_PKG,
+        .perf_subsys = "power",
+        .perf_name = "energy-pkg",
+        .msr = MSR_PKG_ENERGY_STATUS,
+        .msr_mask = 0xFFFFFFFFFFFFFFFF,
+        .msr_shift = 0,
+        .platform_rapl_msr_scale = &rapl_energy_units,
+        .rci_index = RAPL_RCI_INDEX_ENERGY_PKG,
+        .bic = BIC_PkgWatt | BIC_Pkg_J,
+        .compat_scale = 1.0,
+        .flags = RAPL_COUNTER_FLAG_USE_MSR_SUM,
+         },
+       {
+        .feature_mask = RAPL_AMD_F17H,
+        .perf_subsys = "power",
+        .perf_name = "energy-pkg",
+        .msr = MSR_PKG_ENERGY_STAT,
+        .msr_mask = 0xFFFFFFFFFFFFFFFF,
+        .msr_shift = 0,
+        .platform_rapl_msr_scale = &rapl_energy_units,
+        .rci_index = RAPL_RCI_INDEX_ENERGY_PKG,
+        .bic = BIC_PkgWatt | BIC_Pkg_J,
+        .compat_scale = 1.0,
+        .flags = RAPL_COUNTER_FLAG_USE_MSR_SUM,
+         },
+       {
+        .feature_mask = RAPL_CORE_ENERGY_STATUS,
+        .perf_subsys = "power",
+        .perf_name = "energy-cores",
+        .msr = MSR_PP0_ENERGY_STATUS,
+        .msr_mask = 0xFFFFFFFFFFFFFFFF,
+        .msr_shift = 0,
+        .platform_rapl_msr_scale = &rapl_energy_units,
+        .rci_index = RAPL_RCI_INDEX_ENERGY_CORES,
+        .bic = BIC_CorWatt | BIC_Cor_J,
+        .compat_scale = 1.0,
+        .flags = RAPL_COUNTER_FLAG_USE_MSR_SUM,
+         },
+       {
+        .feature_mask = RAPL_DRAM,
+        .perf_subsys = "power",
+        .perf_name = "energy-ram",
+        .msr = MSR_DRAM_ENERGY_STATUS,
+        .msr_mask = 0xFFFFFFFFFFFFFFFF,
+        .msr_shift = 0,
+        .platform_rapl_msr_scale = &rapl_dram_energy_units,
+        .rci_index = RAPL_RCI_INDEX_DRAM,
+        .bic = BIC_RAMWatt | BIC_RAM_J,
+        .compat_scale = 1.0,
+        .flags = RAPL_COUNTER_FLAG_USE_MSR_SUM,
+         },
+       {
+        .feature_mask = RAPL_GFX,
+        .perf_subsys = "power",
+        .perf_name = "energy-gpu",
+        .msr = MSR_PP1_ENERGY_STATUS,
+        .msr_mask = 0xFFFFFFFFFFFFFFFF,
+        .msr_shift = 0,
+        .platform_rapl_msr_scale = &rapl_energy_units,
+        .rci_index = RAPL_RCI_INDEX_GFX,
+        .bic = BIC_GFXWatt | BIC_GFX_J,
+        .compat_scale = 1.0,
+        .flags = RAPL_COUNTER_FLAG_USE_MSR_SUM,
+         },
+       {
+        .feature_mask = RAPL_PKG_PERF_STATUS,
+        .perf_subsys = NULL,
+        .perf_name = NULL,
+        .msr = MSR_PKG_PERF_STATUS,
+        .msr_mask = 0xFFFFFFFFFFFFFFFF,
+        .msr_shift = 0,
+        .platform_rapl_msr_scale = &rapl_time_units,
+        .rci_index = RAPL_RCI_INDEX_PKG_PERF_STATUS,
+        .bic = BIC_PKG__,
+        .compat_scale = 100.0,
+        .flags = RAPL_COUNTER_FLAG_USE_MSR_SUM,
+         },
+       {
+        .feature_mask = RAPL_DRAM_PERF_STATUS,
+        .perf_subsys = NULL,
+        .perf_name = NULL,
+        .msr = MSR_DRAM_PERF_STATUS,
+        .msr_mask = 0xFFFFFFFFFFFFFFFF,
+        .msr_shift = 0,
+        .platform_rapl_msr_scale = &rapl_time_units,
+        .rci_index = RAPL_RCI_INDEX_DRAM_PERF_STATUS,
+        .bic = BIC_RAM__,
+        .compat_scale = 100.0,
+        .flags = RAPL_COUNTER_FLAG_USE_MSR_SUM,
+         },
+       {
+        .feature_mask = RAPL_AMD_F17H,
+        .perf_subsys = NULL,
+        .perf_name = NULL,
+        .msr = MSR_CORE_ENERGY_STAT,
+        .msr_mask = 0xFFFFFFFF,
+        .msr_shift = 0,
+        .platform_rapl_msr_scale = &rapl_energy_units,
+        .rci_index = RAPL_RCI_INDEX_CORE_ENERGY,
+        .bic = BIC_CorWatt | BIC_Cor_J,
+        .compat_scale = 1.0,
+        .flags = 0,
+         },
+};
+
+struct rapl_counter {
+       unsigned long long raw_value;
+       enum rapl_unit unit;
+       double scale;
+};
+
 struct thread_data {
        struct timeval tv_begin;
        struct timeval tv_end;
@@ -974,7 +1178,7 @@ struct core_data {
        unsigned long long c7;
        unsigned long long mc6_us;      /* duplicate as per-core for now, even though per module */
        unsigned int core_temp_c;
-       unsigned int core_energy;       /* MSR_CORE_ENERGY_STAT */
+       struct rapl_counter core_energy;        /* MSR_CORE_ENERGY_STAT */
        unsigned int core_id;
        unsigned long long core_throt_cnt;
        unsigned long long counter[MAX_ADDED_COUNTERS];
@@ -989,8 +1193,8 @@ struct pkg_data {
        unsigned long long pc8;
        unsigned long long pc9;
        unsigned long long pc10;
-       unsigned long long cpu_lpi;
-       unsigned long long sys_lpi;
+       long long cpu_lpi;
+       long long sys_lpi;
        unsigned long long pkg_wtd_core_c0;
        unsigned long long pkg_any_core_c0;
        unsigned long long pkg_any_gfxe_c0;
@@ -998,13 +1202,16 @@ struct pkg_data {
        long long gfx_rc6_ms;
        unsigned int gfx_mhz;
        unsigned int gfx_act_mhz;
+       long long sam_mc6_ms;
+       unsigned int sam_mhz;
+       unsigned int sam_act_mhz;
        unsigned int package_id;
-       unsigned long long energy_pkg;  /* MSR_PKG_ENERGY_STATUS */
-       unsigned long long energy_dram; /* MSR_DRAM_ENERGY_STATUS */
-       unsigned long long energy_cores;        /* MSR_PP0_ENERGY_STATUS */
-       unsigned long long energy_gfx;  /* MSR_PP1_ENERGY_STATUS */
-       unsigned long long rapl_pkg_perf_status;        /* MSR_PKG_PERF_STATUS */
-       unsigned long long rapl_dram_perf_status;       /* MSR_DRAM_PERF_STATUS */
+       struct rapl_counter energy_pkg; /* MSR_PKG_ENERGY_STATUS */
+       struct rapl_counter energy_dram;        /* MSR_DRAM_ENERGY_STATUS */
+       struct rapl_counter energy_cores;       /* MSR_PP0_ENERGY_STATUS */
+       struct rapl_counter energy_gfx; /* MSR_PP1_ENERGY_STATUS */
+       struct rapl_counter rapl_pkg_perf_status;       /* MSR_PKG_PERF_STATUS */
+       struct rapl_counter rapl_dram_perf_status;      /* MSR_DRAM_PERF_STATUS */
        unsigned int pkg_temp_c;
        unsigned int uncore_mhz;
        unsigned long long counter[MAX_ADDED_COUNTERS];
@@ -1150,6 +1357,38 @@ struct sys_counters {
        struct msr_counter *pp;
 } sys;
 
+void free_sys_counters(void)
+{
+       struct msr_counter *p = sys.tp, *pnext = NULL;
+
+       while (p) {
+               pnext = p->next;
+               free(p);
+               p = pnext;
+       }
+
+       p = sys.cp, pnext = NULL;
+       while (p) {
+               pnext = p->next;
+               free(p);
+               p = pnext;
+       }
+
+       p = sys.pp, pnext = NULL;
+       while (p) {
+               pnext = p->next;
+               free(p);
+               p = pnext;
+       }
+
+       sys.added_thread_counters = 0;
+       sys.added_core_counters = 0;
+       sys.added_package_counters = 0;
+       sys.tp = NULL;
+       sys.cp = NULL;
+       sys.pp = NULL;
+}
+
 struct system_summary {
        struct thread_data threads;
        struct core_data cores;
@@ -1280,34 +1519,60 @@ int get_msr_fd(int cpu)
        sprintf(pathname, "/dev/cpu/%d/msr", cpu);
        fd = open(pathname, O_RDONLY);
        if (fd < 0)
-               err(-1, "%s open failed, try chown or chmod +r /dev/cpu/*/msr, or run as root", pathname);
+               err(-1, "%s open failed, try chown or chmod +r /dev/cpu/*/msr, "
+                   "or run with --no-msr, or run as root", pathname);
 
        fd_percpu[cpu] = fd;
 
        return fd;
 }
 
+static void bic_disable_msr_access(void)
+{
+       const unsigned long bic_msrs =
+           BIC_SMI |
+           BIC_CPU_c1 |
+           BIC_CPU_c3 |
+           BIC_CPU_c6 |
+           BIC_CPU_c7 |
+           BIC_Mod_c6 |
+           BIC_CoreTmp |
+           BIC_Totl_c0 |
+           BIC_Any_c0 |
+           BIC_GFX_c0 |
+           BIC_CPUGFX |
+           BIC_Pkgpc2 | BIC_Pkgpc3 | BIC_Pkgpc6 | BIC_Pkgpc7 | BIC_Pkgpc8 | BIC_Pkgpc9 | BIC_Pkgpc10 | BIC_PkgTmp;
+
+       bic_enabled &= ~bic_msrs;
+
+       free_sys_counters();
+}
+
 static long perf_event_open(struct perf_event_attr *hw_event, pid_t pid, int cpu, int group_fd, unsigned long flags)
 {
+       assert(!no_perf);
+
        return syscall(__NR_perf_event_open, hw_event, pid, cpu, group_fd, flags);
 }
 
-static int perf_instr_count_open(int cpu_num)
+static long open_perf_counter(int cpu, unsigned int type, unsigned int config, int group_fd, __u64 read_format)
 {
-       struct perf_event_attr pea;
-       int fd;
+       struct perf_event_attr attr;
+       const pid_t pid = -1;
+       const unsigned long flags = 0;
 
-       memset(&pea, 0, sizeof(struct perf_event_attr));
-       pea.type = PERF_TYPE_HARDWARE;
-       pea.size = sizeof(struct perf_event_attr);
-       pea.config = PERF_COUNT_HW_INSTRUCTIONS;
+       assert(!no_perf);
 
-       /* counter for cpu_num, including user + kernel and all processes */
-       fd = perf_event_open(&pea, -1, cpu_num, -1, 0);
-       if (fd == -1) {
-               warnx("capget(CAP_PERFMON) failed, try \"# setcap cap_sys_admin=ep %s\"", progname);
-               BIC_NOT_PRESENT(BIC_IPC);
-       }
+       memset(&attr, 0, sizeof(struct perf_event_attr));
+
+       attr.type = type;
+       attr.size = sizeof(struct perf_event_attr);
+       attr.config = config;
+       attr.disabled = 0;
+       attr.sample_type = PERF_SAMPLE_IDENTIFIER;
+       attr.read_format = read_format;
+
+       const int fd = perf_event_open(&attr, pid, cpu, group_fd, flags);
 
        return fd;
 }
@@ -1317,7 +1582,7 @@ int get_instr_count_fd(int cpu)
        if (fd_instr_count_percpu[cpu])
                return fd_instr_count_percpu[cpu];
 
-       fd_instr_count_percpu[cpu] = perf_instr_count_open(cpu);
+       fd_instr_count_percpu[cpu] = open_perf_counter(cpu, PERF_TYPE_HARDWARE, PERF_COUNT_HW_INSTRUCTIONS, -1, 0);
 
        return fd_instr_count_percpu[cpu];
 }
@@ -1326,6 +1591,8 @@ int get_msr(int cpu, off_t offset, unsigned long long *msr)
 {
        ssize_t retval;
 
+       assert(!no_msr);
+
        retval = pread(get_msr_fd(cpu), msr, sizeof(*msr), offset);
 
        if (retval != sizeof *msr)
@@ -1334,6 +1601,21 @@ int get_msr(int cpu, off_t offset, unsigned long long *msr)
        return 0;
 }
 
+int probe_msr(int cpu, off_t offset)
+{
+       ssize_t retval;
+       unsigned long long dummy;
+
+       assert(!no_msr);
+
+       retval = pread(get_msr_fd(cpu), &dummy, sizeof(dummy), offset);
+
+       if (retval != sizeof(dummy))
+               return 1;
+
+       return 0;
+}
+
 #define MAX_DEFERRED 16
 char *deferred_add_names[MAX_DEFERRED];
 char *deferred_skip_names[MAX_DEFERRED];
@@ -1369,6 +1651,8 @@ void help(void)
                "               Override default 5-second measurement interval\n"
                "  -J, --Joules displays energy in Joules instead of Watts\n"
                "  -l, --list   list column headers only\n"
+               "  -M, --no-msr Disable all uses of the MSR driver\n"
+               "  -P, --no-perf Disable all uses of the perf API\n"
                "  -n, --num_iterations num\n"
                "               number of the measurement iterations\n"
                "  -N, --header_iterations num\n"
@@ -1573,6 +1857,15 @@ void print_header(char *delim)
        if (DO_BIC(BIC_GFXACTMHz))
                outp += sprintf(outp, "%sGFXAMHz", (printed++ ? delim : ""));
 
+       if (DO_BIC(BIC_SAM_mc6))
+               outp += sprintf(outp, "%sSAM%%mc6", (printed++ ? delim : ""));
+
+       if (DO_BIC(BIC_SAMMHz))
+               outp += sprintf(outp, "%sSAMMHz", (printed++ ? delim : ""));
+
+       if (DO_BIC(BIC_SAMACTMHz))
+               outp += sprintf(outp, "%sSAMAMHz", (printed++ ? delim : ""));
+
        if (DO_BIC(BIC_Totl_c0))
                outp += sprintf(outp, "%sTotl%%C0", (printed++ ? delim : ""));
        if (DO_BIC(BIC_Any_c0))
@@ -1671,26 +1964,35 @@ int dump_counters(struct thread_data *t, struct core_data *c, struct pkg_data *p
                        outp += sprintf(outp, "SMI: %d\n", t->smi_count);
 
                for (i = 0, mp = sys.tp; mp; i++, mp = mp->next) {
-                       outp += sprintf(outp, "tADDED [%d] msr0x%x: %08llX\n", i, mp->msr_num, t->counter[i]);
+                       outp +=
+                           sprintf(outp, "tADDED [%d] %8s msr0x%x: %08llX %s\n", i, mp->name, mp->msr_num,
+                                   t->counter[i], mp->path);
                }
        }
 
-       if (c) {
+       if (c && is_cpu_first_thread_in_core(t, c, p)) {
                outp += sprintf(outp, "core: %d\n", c->core_id);
                outp += sprintf(outp, "c3: %016llX\n", c->c3);
                outp += sprintf(outp, "c6: %016llX\n", c->c6);
                outp += sprintf(outp, "c7: %016llX\n", c->c7);
                outp += sprintf(outp, "DTS: %dC\n", c->core_temp_c);
                outp += sprintf(outp, "cpu_throt_count: %016llX\n", c->core_throt_cnt);
-               outp += sprintf(outp, "Joules: %0X\n", c->core_energy);
+
+               const unsigned long long energy_value = c->core_energy.raw_value * c->core_energy.scale;
+               const double energy_scale = c->core_energy.scale;
+
+               if (c->core_energy.unit == RAPL_UNIT_JOULES)
+                       outp += sprintf(outp, "Joules: %0llX (scale: %lf)\n", energy_value, energy_scale);
 
                for (i = 0, mp = sys.cp; mp; i++, mp = mp->next) {
-                       outp += sprintf(outp, "cADDED [%d] msr0x%x: %08llX\n", i, mp->msr_num, c->counter[i]);
+                       outp +=
+                           sprintf(outp, "cADDED [%d] %8s msr0x%x: %08llX %s\n", i, mp->name, mp->msr_num,
+                                   c->counter[i], mp->path);
                }
                outp += sprintf(outp, "mc6_us: %016llX\n", c->mc6_us);
        }
 
-       if (p) {
+       if (p && is_cpu_first_core_in_package(t, c, p)) {
                outp += sprintf(outp, "package: %d\n", p->package_id);
 
                outp += sprintf(outp, "Weighted cores: %016llX\n", p->pkg_wtd_core_c0);
@@ -1710,16 +2012,18 @@ int dump_counters(struct thread_data *t, struct core_data *c, struct pkg_data *p
                outp += sprintf(outp, "pc10: %016llX\n", p->pc10);
                outp += sprintf(outp, "cpu_lpi: %016llX\n", p->cpu_lpi);
                outp += sprintf(outp, "sys_lpi: %016llX\n", p->sys_lpi);
-               outp += sprintf(outp, "Joules PKG: %0llX\n", p->energy_pkg);
-               outp += sprintf(outp, "Joules COR: %0llX\n", p->energy_cores);
-               outp += sprintf(outp, "Joules GFX: %0llX\n", p->energy_gfx);
-               outp += sprintf(outp, "Joules RAM: %0llX\n", p->energy_dram);
-               outp += sprintf(outp, "Throttle PKG: %0llX\n", p->rapl_pkg_perf_status);
-               outp += sprintf(outp, "Throttle RAM: %0llX\n", p->rapl_dram_perf_status);
+               outp += sprintf(outp, "Joules PKG: %0llX\n", p->energy_pkg.raw_value);
+               outp += sprintf(outp, "Joules COR: %0llX\n", p->energy_cores.raw_value);
+               outp += sprintf(outp, "Joules GFX: %0llX\n", p->energy_gfx.raw_value);
+               outp += sprintf(outp, "Joules RAM: %0llX\n", p->energy_dram.raw_value);
+               outp += sprintf(outp, "Throttle PKG: %0llX\n", p->rapl_pkg_perf_status.raw_value);
+               outp += sprintf(outp, "Throttle RAM: %0llX\n", p->rapl_dram_perf_status.raw_value);
                outp += sprintf(outp, "PTM: %dC\n", p->pkg_temp_c);
 
                for (i = 0, mp = sys.pp; mp; i++, mp = mp->next) {
-                       outp += sprintf(outp, "pADDED [%d] msr0x%x: %08llX\n", i, mp->msr_num, p->counter[i]);
+                       outp +=
+                           sprintf(outp, "pADDED [%d] %8s msr0x%x: %08llX %s\n", i, mp->name, mp->msr_num,
+                                   p->counter[i], mp->path);
                }
        }
 
@@ -1728,6 +2032,23 @@ int dump_counters(struct thread_data *t, struct core_data *c, struct pkg_data *p
        return 0;
 }
 
+double rapl_counter_get_value(const struct rapl_counter *c, enum rapl_unit desired_unit, double interval)
+{
+       assert(desired_unit != RAPL_UNIT_INVALID);
+
+       /*
+        * For now we don't expect anything other than joules,
+        * so just simplify the logic.
+        */
+       assert(c->unit == RAPL_UNIT_JOULES);
+
+       const double scaled = c->raw_value * c->scale;
+
+       if (desired_unit == RAPL_UNIT_WATTS)
+               return scaled / interval;
+       return scaled;
+}
+
 /*
  * column formatting convention & formats
  */
@@ -1921,9 +2242,11 @@ int format_counters(struct thread_data *t, struct core_data *c, struct pkg_data
 
        if (DO_BIC(BIC_CorWatt) && platform->has_per_core_rapl)
                outp +=
-                   sprintf(outp, fmt8, (printed++ ? delim : ""), c->core_energy * rapl_energy_units / interval_float);
+                   sprintf(outp, fmt8, (printed++ ? delim : ""),
+                           rapl_counter_get_value(&c->core_energy, RAPL_UNIT_WATTS, interval_float));
        if (DO_BIC(BIC_Cor_J) && platform->has_per_core_rapl)
-               outp += sprintf(outp, fmt8, (printed++ ? delim : ""), c->core_energy * rapl_energy_units);
+               outp += sprintf(outp, fmt8, (printed++ ? delim : ""),
+                               rapl_counter_get_value(&c->core_energy, RAPL_UNIT_JOULES, interval_float));
 
        /* print per-package data only for 1st core in package */
        if (!is_cpu_first_core_in_package(t, c, p))
@@ -1951,6 +2274,24 @@ int format_counters(struct thread_data *t, struct core_data *c, struct pkg_data
        if (DO_BIC(BIC_GFXACTMHz))
                outp += sprintf(outp, "%s%d", (printed++ ? delim : ""), p->gfx_act_mhz);
 
+       /* SAMmc6 */
+       if (DO_BIC(BIC_SAM_mc6)) {
+               if (p->sam_mc6_ms == -1) {      /* detect GFX counter reset */
+                       outp += sprintf(outp, "%s**.**", (printed++ ? delim : ""));
+               } else {
+                       outp += sprintf(outp, "%s%.2f", (printed++ ? delim : ""),
+                                       p->sam_mc6_ms / 10.0 / interval_float);
+               }
+       }
+
+       /* SAMMHz */
+       if (DO_BIC(BIC_SAMMHz))
+               outp += sprintf(outp, "%s%d", (printed++ ? delim : ""), p->sam_mhz);
+
+       /* SAMACTMHz */
+       if (DO_BIC(BIC_SAMACTMHz))
+               outp += sprintf(outp, "%s%d", (printed++ ? delim : ""), p->sam_act_mhz);
+
        /* Totl%C0, Any%C0 GFX%C0 CPUGFX% */
        if (DO_BIC(BIC_Totl_c0))
                outp += sprintf(outp, "%s%.2f", (printed++ ? delim : ""), 100.0 * p->pkg_wtd_core_c0 / tsc);
@@ -1976,43 +2317,59 @@ int format_counters(struct thread_data *t, struct core_data *c, struct pkg_data
        if (DO_BIC(BIC_Pkgpc10))
                outp += sprintf(outp, "%s%.2f", (printed++ ? delim : ""), 100.0 * p->pc10 / tsc);
 
-       if (DO_BIC(BIC_CPU_LPI))
-               outp +=
-                   sprintf(outp, "%s%.2f", (printed++ ? delim : ""), 100.0 * p->cpu_lpi / 1000000.0 / interval_float);
-       if (DO_BIC(BIC_SYS_LPI))
-               outp +=
-                   sprintf(outp, "%s%.2f", (printed++ ? delim : ""), 100.0 * p->sys_lpi / 1000000.0 / interval_float);
+       if (DO_BIC(BIC_CPU_LPI)) {
+               if (p->cpu_lpi >= 0)
+                       outp +=
+                           sprintf(outp, "%s%.2f", (printed++ ? delim : ""),
+                                   100.0 * p->cpu_lpi / 1000000.0 / interval_float);
+               else
+                       outp += sprintf(outp, "%s(neg)", (printed++ ? delim : ""));
+       }
+       if (DO_BIC(BIC_SYS_LPI)) {
+               if (p->sys_lpi >= 0)
+                       outp +=
+                           sprintf(outp, "%s%.2f", (printed++ ? delim : ""),
+                                   100.0 * p->sys_lpi / 1000000.0 / interval_float);
+               else
+                       outp += sprintf(outp, "%s(neg)", (printed++ ? delim : ""));
+       }
 
        if (DO_BIC(BIC_PkgWatt))
                outp +=
-                   sprintf(outp, fmt8, (printed++ ? delim : ""), p->energy_pkg * rapl_energy_units / interval_float);
-
+                   sprintf(outp, fmt8, (printed++ ? delim : ""),
+                           rapl_counter_get_value(&p->energy_pkg, RAPL_UNIT_WATTS, interval_float));
        if (DO_BIC(BIC_CorWatt) && !platform->has_per_core_rapl)
                outp +=
-                   sprintf(outp, fmt8, (printed++ ? delim : ""), p->energy_cores * rapl_energy_units / interval_float);
+                   sprintf(outp, fmt8, (printed++ ? delim : ""),
+                           rapl_counter_get_value(&p->energy_cores, RAPL_UNIT_WATTS, interval_float));
        if (DO_BIC(BIC_GFXWatt))
                outp +=
-                   sprintf(outp, fmt8, (printed++ ? delim : ""), p->energy_gfx * rapl_energy_units / interval_float);
+                   sprintf(outp, fmt8, (printed++ ? delim : ""),
+                           rapl_counter_get_value(&p->energy_gfx, RAPL_UNIT_WATTS, interval_float));
        if (DO_BIC(BIC_RAMWatt))
                outp +=
                    sprintf(outp, fmt8, (printed++ ? delim : ""),
-                           p->energy_dram * rapl_dram_energy_units / interval_float);
+                           rapl_counter_get_value(&p->energy_dram, RAPL_UNIT_WATTS, interval_float));
        if (DO_BIC(BIC_Pkg_J))
-               outp += sprintf(outp, fmt8, (printed++ ? delim : ""), p->energy_pkg * rapl_energy_units);
+               outp += sprintf(outp, fmt8, (printed++ ? delim : ""),
+                               rapl_counter_get_value(&p->energy_pkg, RAPL_UNIT_JOULES, interval_float));
        if (DO_BIC(BIC_Cor_J) && !platform->has_per_core_rapl)
-               outp += sprintf(outp, fmt8, (printed++ ? delim : ""), p->energy_cores * rapl_energy_units);
+               outp += sprintf(outp, fmt8, (printed++ ? delim : ""),
+                               rapl_counter_get_value(&p->energy_cores, RAPL_UNIT_JOULES, interval_float));
        if (DO_BIC(BIC_GFX_J))
-               outp += sprintf(outp, fmt8, (printed++ ? delim : ""), p->energy_gfx * rapl_energy_units);
+               outp += sprintf(outp, fmt8, (printed++ ? delim : ""),
+                               rapl_counter_get_value(&p->energy_gfx, RAPL_UNIT_JOULES, interval_float));
        if (DO_BIC(BIC_RAM_J))
-               outp += sprintf(outp, fmt8, (printed++ ? delim : ""), p->energy_dram * rapl_dram_energy_units);
+               outp += sprintf(outp, fmt8, (printed++ ? delim : ""),
+                               rapl_counter_get_value(&p->energy_dram, RAPL_UNIT_JOULES, interval_float));
        if (DO_BIC(BIC_PKG__))
                outp +=
                    sprintf(outp, fmt8, (printed++ ? delim : ""),
-                           100.0 * p->rapl_pkg_perf_status * rapl_time_units / interval_float);
+                           rapl_counter_get_value(&p->rapl_pkg_perf_status, RAPL_UNIT_WATTS, interval_float));
        if (DO_BIC(BIC_RAM__))
                outp +=
                    sprintf(outp, fmt8, (printed++ ? delim : ""),
-                           100.0 * p->rapl_dram_perf_status * rapl_time_units / interval_float);
+                           rapl_counter_get_value(&p->rapl_dram_perf_status, RAPL_UNIT_WATTS, interval_float));
        /* UncMHz */
        if (DO_BIC(BIC_UNCORE_MHZ))
                outp += sprintf(outp, "%s%d", (printed++ ? delim : ""), p->uncore_mhz);
@@ -2121,12 +2478,22 @@ int delta_package(struct pkg_data *new, struct pkg_data *old)
        old->gfx_mhz = new->gfx_mhz;
        old->gfx_act_mhz = new->gfx_act_mhz;
 
-       old->energy_pkg = new->energy_pkg - old->energy_pkg;
-       old->energy_cores = new->energy_cores - old->energy_cores;
-       old->energy_gfx = new->energy_gfx - old->energy_gfx;
-       old->energy_dram = new->energy_dram - old->energy_dram;
-       old->rapl_pkg_perf_status = new->rapl_pkg_perf_status - old->rapl_pkg_perf_status;
-       old->rapl_dram_perf_status = new->rapl_dram_perf_status - old->rapl_dram_perf_status;
+       /* flag an error when mc6 counter resets/wraps */
+       if (old->sam_mc6_ms > new->sam_mc6_ms)
+               old->sam_mc6_ms = -1;
+       else
+               old->sam_mc6_ms = new->sam_mc6_ms - old->sam_mc6_ms;
+
+       old->sam_mhz = new->sam_mhz;
+       old->sam_act_mhz = new->sam_act_mhz;
+
+       old->energy_pkg.raw_value = new->energy_pkg.raw_value - old->energy_pkg.raw_value;
+       old->energy_cores.raw_value = new->energy_cores.raw_value - old->energy_cores.raw_value;
+       old->energy_gfx.raw_value = new->energy_gfx.raw_value - old->energy_gfx.raw_value;
+       old->energy_dram.raw_value = new->energy_dram.raw_value - old->energy_dram.raw_value;
+       old->rapl_pkg_perf_status.raw_value = new->rapl_pkg_perf_status.raw_value - old->rapl_pkg_perf_status.raw_value;
+       old->rapl_dram_perf_status.raw_value =
+           new->rapl_dram_perf_status.raw_value - old->rapl_dram_perf_status.raw_value;
 
        for (i = 0, mp = sys.pp; mp; i++, mp = mp->next) {
                if (mp->format == FORMAT_RAW)
@@ -2150,7 +2517,7 @@ void delta_core(struct core_data *new, struct core_data *old)
        old->core_throt_cnt = new->core_throt_cnt;
        old->mc6_us = new->mc6_us - old->mc6_us;
 
-       DELTA_WRAP32(new->core_energy, old->core_energy);
+       DELTA_WRAP32(new->core_energy.raw_value, old->core_energy.raw_value);
 
        for (i = 0, mp = sys.cp; mp; i++, mp = mp->next) {
                if (mp->format == FORMAT_RAW)
@@ -2277,6 +2644,13 @@ int delta_cpu(struct thread_data *t, struct core_data *c,
        return retval;
 }
 
+void rapl_counter_clear(struct rapl_counter *c)
+{
+       c->raw_value = 0;
+       c->scale = 0.0;
+       c->unit = RAPL_UNIT_INVALID;
+}
+
 void clear_counters(struct thread_data *t, struct core_data *c, struct pkg_data *p)
 {
        int i;
@@ -2304,7 +2678,7 @@ void clear_counters(struct thread_data *t, struct core_data *c, struct pkg_data
        c->c7 = 0;
        c->mc6_us = 0;
        c->core_temp_c = 0;
-       c->core_energy = 0;
+       rapl_counter_clear(&c->core_energy);
        c->core_throt_cnt = 0;
 
        p->pkg_wtd_core_c0 = 0;
@@ -2325,18 +2699,21 @@ void clear_counters(struct thread_data *t, struct core_data *c, struct pkg_data
        p->cpu_lpi = 0;
        p->sys_lpi = 0;
 
-       p->energy_pkg = 0;
-       p->energy_dram = 0;
-       p->energy_cores = 0;
-       p->energy_gfx = 0;
-       p->rapl_pkg_perf_status = 0;
-       p->rapl_dram_perf_status = 0;
+       rapl_counter_clear(&p->energy_pkg);
+       rapl_counter_clear(&p->energy_dram);
+       rapl_counter_clear(&p->energy_cores);
+       rapl_counter_clear(&p->energy_gfx);
+       rapl_counter_clear(&p->rapl_pkg_perf_status);
+       rapl_counter_clear(&p->rapl_dram_perf_status);
        p->pkg_temp_c = 0;
 
        p->gfx_rc6_ms = 0;
        p->uncore_mhz = 0;
        p->gfx_mhz = 0;
        p->gfx_act_mhz = 0;
+       p->sam_mc6_ms = 0;
+       p->sam_mhz = 0;
+       p->sam_act_mhz = 0;
        for (i = 0, mp = sys.tp; mp; i++, mp = mp->next)
                t->counter[i] = 0;
 
@@ -2347,6 +2724,20 @@ void clear_counters(struct thread_data *t, struct core_data *c, struct pkg_data
                p->counter[i] = 0;
 }
 
+void rapl_counter_accumulate(struct rapl_counter *dst, const struct rapl_counter *src)
+{
+       /* Copy unit and scale from src if dst is not initialized */
+       if (dst->unit == RAPL_UNIT_INVALID) {
+               dst->unit = src->unit;
+               dst->scale = src->scale;
+       }
+
+       assert(dst->unit == src->unit);
+       assert(dst->scale == src->scale);
+
+       dst->raw_value += src->raw_value;
+}
+
 int sum_counters(struct thread_data *t, struct core_data *c, struct pkg_data *p)
 {
        int i;
@@ -2393,7 +2784,7 @@ int sum_counters(struct thread_data *t, struct core_data *c, struct pkg_data *p)
        average.cores.core_temp_c = MAX(average.cores.core_temp_c, c->core_temp_c);
        average.cores.core_throt_cnt = MAX(average.cores.core_throt_cnt, c->core_throt_cnt);
 
-       average.cores.core_energy += c->core_energy;
+       rapl_counter_accumulate(&average.cores.core_energy, &c->core_energy);
 
        for (i = 0, mp = sys.cp; mp; i++, mp = mp->next) {
                if (mp->format == FORMAT_RAW)
@@ -2428,25 +2819,29 @@ int sum_counters(struct thread_data *t, struct core_data *c, struct pkg_data *p)
        average.packages.cpu_lpi = p->cpu_lpi;
        average.packages.sys_lpi = p->sys_lpi;
 
-       average.packages.energy_pkg += p->energy_pkg;
-       average.packages.energy_dram += p->energy_dram;
-       average.packages.energy_cores += p->energy_cores;
-       average.packages.energy_gfx += p->energy_gfx;
+       rapl_counter_accumulate(&average.packages.energy_pkg, &p->energy_pkg);
+       rapl_counter_accumulate(&average.packages.energy_dram, &p->energy_dram);
+       rapl_counter_accumulate(&average.packages.energy_cores, &p->energy_cores);
+       rapl_counter_accumulate(&average.packages.energy_gfx, &p->energy_gfx);
 
        average.packages.gfx_rc6_ms = p->gfx_rc6_ms;
        average.packages.uncore_mhz = p->uncore_mhz;
        average.packages.gfx_mhz = p->gfx_mhz;
        average.packages.gfx_act_mhz = p->gfx_act_mhz;
+       average.packages.sam_mc6_ms = p->sam_mc6_ms;
+       average.packages.sam_mhz = p->sam_mhz;
+       average.packages.sam_act_mhz = p->sam_act_mhz;
 
        average.packages.pkg_temp_c = MAX(average.packages.pkg_temp_c, p->pkg_temp_c);
 
-       average.packages.rapl_pkg_perf_status += p->rapl_pkg_perf_status;
-       average.packages.rapl_dram_perf_status += p->rapl_dram_perf_status;
+       rapl_counter_accumulate(&average.packages.rapl_pkg_perf_status, &p->rapl_pkg_perf_status);
+       rapl_counter_accumulate(&average.packages.rapl_dram_perf_status, &p->rapl_dram_perf_status);
 
        for (i = 0, mp = sys.pp; mp; i++, mp = mp->next) {
-               if (mp->format == FORMAT_RAW)
-                       continue;
-               average.packages.counter[i] += p->counter[i];
+               if ((mp->format == FORMAT_RAW) && (topo.num_packages == 0))
+                       average.packages.counter[i] = p->counter[i];
+               else
+                       average.packages.counter[i] += p->counter[i];
        }
        return 0;
 }
@@ -2578,6 +2973,7 @@ unsigned long long snapshot_sysfs_counter(char *path)
 int get_mp(int cpu, struct msr_counter *mp, unsigned long long *counterp)
 {
        if (mp->msr_num != 0) {
+               assert(!no_msr);
                if (get_msr(cpu, mp->msr_num, counterp))
                        return -1;
        } else {
@@ -2599,7 +2995,7 @@ unsigned long long get_uncore_mhz(int package, int die)
 {
        char path[128];
 
-       sprintf(path, "/sys/devices/system/cpu/intel_uncore_frequency/package_0%d_die_0%d/current_freq_khz", package,
+       sprintf(path, "/sys/devices/system/cpu/intel_uncore_frequency/package_%02d_die_%02d/current_freq_khz", package,
                die);
 
        return (snapshot_sysfs_counter(path) / 1000);
@@ -2627,6 +3023,9 @@ int get_epb(int cpu)
        return epb;
 
 msr_fallback:
+       if (no_msr)
+               return -1;
+
        get_msr(cpu, MSR_IA32_ENERGY_PERF_BIAS, &msr);
 
        return msr & 0xf;
@@ -2700,187 +3099,495 @@ int get_core_throt_cnt(int cpu, unsigned long long *cnt)
        return 0;
 }
 
-/*
- * get_counters(...)
- * migrate to cpu
- * acquire and record local counters for that cpu
- */
-int get_counters(struct thread_data *t, struct core_data *c, struct pkg_data *p)
+struct amperf_group_fd {
+       int aperf;              /* Also the group descriptor */
+       int mperf;
+};
+
+static int read_perf_counter_info(const char *const path, const char *const parse_format, void *value_ptr)
 {
-       int cpu = t->cpu_id;
-       unsigned long long msr;
-       int aperf_mperf_retry_count = 0;
-       struct msr_counter *mp;
-       int i;
+       int fdmt;
+       int bytes_read;
+       char buf[64];
+       int ret = -1;
 
-       if (cpu_migrate(cpu)) {
-               fprintf(outf, "get_counters: Could not migrate to CPU %d\n", cpu);
-               return -1;
+       fdmt = open(path, O_RDONLY, 0);
+       if (fdmt == -1) {
+               if (debug)
+                       fprintf(stderr, "Failed to parse perf counter info %s\n", path);
+               ret = -1;
+               goto cleanup_and_exit;
        }
 
-       gettimeofday(&t->tv_begin, (struct timezone *)NULL);
+       bytes_read = read(fdmt, buf, sizeof(buf) - 1);
+       if (bytes_read <= 0 || bytes_read >= (int)sizeof(buf)) {
+               if (debug)
+                       fprintf(stderr, "Failed to parse perf counter info %s\n", path);
+               ret = -1;
+               goto cleanup_and_exit;
+       }
 
-       if (first_counter_read)
-               get_apic_id(t);
-retry:
-       t->tsc = rdtsc();       /* we are running on local CPU of interest */
+       buf[bytes_read] = '\0';
 
-       if (DO_BIC(BIC_Avg_MHz) || DO_BIC(BIC_Busy) || DO_BIC(BIC_Bzy_MHz) || DO_BIC(BIC_IPC)
-           || soft_c1_residency_display(BIC_Avg_MHz)) {
-               unsigned long long tsc_before, tsc_between, tsc_after, aperf_time, mperf_time;
+       if (sscanf(buf, parse_format, value_ptr) != 1) {
+               if (debug)
+                       fprintf(stderr, "Failed to parse perf counter info %s\n", path);
+               ret = -1;
+               goto cleanup_and_exit;
+       }
 
-               /*
-                * The TSC, APERF and MPERF must be read together for
-                * APERF/MPERF and MPERF/TSC to give accurate results.
-                *
-                * Unfortunately, APERF and MPERF are read by
-                * individual system call, so delays may occur
-                * between them.  If the time to read them
-                * varies by a large amount, we re-read them.
-                */
+       ret = 0;
 
-               /*
-                * This initial dummy APERF read has been seen to
-                * reduce jitter in the subsequent reads.
-                */
+cleanup_and_exit:
+       close(fdmt);
+       return ret;
+}
 
-               if (get_msr(cpu, MSR_IA32_APERF, &t->aperf))
-                       return -3;
+static unsigned int read_perf_counter_info_n(const char *const path, const char *const parse_format)
+{
+       unsigned int v;
+       int status;
 
-               t->tsc = rdtsc();       /* re-read close to APERF */
+       status = read_perf_counter_info(path, parse_format, &v);
+       if (status)
+               v = -1;
 
-               tsc_before = t->tsc;
+       return v;
+}
 
-               if (get_msr(cpu, MSR_IA32_APERF, &t->aperf))
-                       return -3;
+static unsigned int read_msr_type(void)
+{
+       const char *const path = "/sys/bus/event_source/devices/msr/type";
+       const char *const format = "%u";
 
-               tsc_between = rdtsc();
+       return read_perf_counter_info_n(path, format);
+}
 
-               if (get_msr(cpu, MSR_IA32_MPERF, &t->mperf))
-                       return -4;
+static unsigned int read_aperf_config(void)
+{
+       const char *const path = "/sys/bus/event_source/devices/msr/events/aperf";
+       const char *const format = "event=%x";
 
-               tsc_after = rdtsc();
+       return read_perf_counter_info_n(path, format);
+}
 
-               aperf_time = tsc_between - tsc_before;
-               mperf_time = tsc_after - tsc_between;
+static unsigned int read_mperf_config(void)
+{
+       const char *const path = "/sys/bus/event_source/devices/msr/events/mperf";
+       const char *const format = "event=%x";
 
-               /*
-                * If the system call latency to read APERF and MPERF
-                * differ by more than 2x, then try again.
-                */
-               if ((aperf_time > (2 * mperf_time)) || (mperf_time > (2 * aperf_time))) {
-                       aperf_mperf_retry_count++;
-                       if (aperf_mperf_retry_count < 5)
-                               goto retry;
-                       else
-                               warnx("cpu%d jitter %lld %lld", cpu, aperf_time, mperf_time);
-               }
-               aperf_mperf_retry_count = 0;
+       return read_perf_counter_info_n(path, format);
+}
 
-               t->aperf = t->aperf * aperf_mperf_multiplier;
-               t->mperf = t->mperf * aperf_mperf_multiplier;
-       }
+static unsigned int read_perf_type(const char *subsys)
+{
+       const char *const path_format = "/sys/bus/event_source/devices/%s/type";
+       const char *const format = "%u";
+       char path[128];
 
-       if (DO_BIC(BIC_IPC))
-               if (read(get_instr_count_fd(cpu), &t->instr_count, sizeof(long long)) != sizeof(long long))
-                       return -4;
+       snprintf(path, sizeof(path), path_format, subsys);
 
-       if (DO_BIC(BIC_IRQ))
-               t->irq_count = irqs_per_cpu[cpu];
-       if (DO_BIC(BIC_SMI)) {
-               if (get_msr(cpu, MSR_SMI_COUNT, &msr))
-                       return -5;
-               t->smi_count = msr & 0xFFFFFFFF;
-       }
-       if (DO_BIC(BIC_CPU_c1) && platform->has_msr_core_c1_res) {
-               if (get_msr(cpu, MSR_CORE_C1_RES, &t->c1))
-                       return -6;
-       }
+       return read_perf_counter_info_n(path, format);
+}
 
-       for (i = 0, mp = sys.tp; mp; i++, mp = mp->next) {
-               if (get_mp(cpu, mp, &t->counter[i]))
-                       return -10;
-       }
+static unsigned int read_rapl_config(const char *subsys, const char *event_name)
+{
+       const char *const path_format = "/sys/bus/event_source/devices/%s/events/%s";
+       const char *const format = "event=%x";
+       char path[128];
 
-       /* collect core counters only for 1st thread in core */
-       if (!is_cpu_first_thread_in_core(t, c, p))
-               goto done;
+       snprintf(path, sizeof(path), path_format, subsys, event_name);
 
-       if (DO_BIC(BIC_CPU_c3) || soft_c1_residency_display(BIC_CPU_c3)) {
-               if (get_msr(cpu, MSR_CORE_C3_RESIDENCY, &c->c3))
-                       return -6;
-       }
+       return read_perf_counter_info_n(path, format);
+}
 
-       if ((DO_BIC(BIC_CPU_c6) || soft_c1_residency_display(BIC_CPU_c6)) && !platform->has_msr_knl_core_c6_residency) {
-               if (get_msr(cpu, MSR_CORE_C6_RESIDENCY, &c->c6))
-                       return -7;
-       } else if (platform->has_msr_knl_core_c6_residency && soft_c1_residency_display(BIC_CPU_c6)) {
-               if (get_msr(cpu, MSR_KNL_CORE_C6_RESIDENCY, &c->c6))
-                       return -7;
-       }
+static unsigned int read_perf_rapl_unit(const char *subsys, const char *event_name)
+{
+       const char *const path_format = "/sys/bus/event_source/devices/%s/events/%s.unit";
+       const char *const format = "%s";
+       char path[128];
+       char unit_buffer[16];
 
-       if (DO_BIC(BIC_CPU_c7) || soft_c1_residency_display(BIC_CPU_c7)) {
-               if (get_msr(cpu, MSR_CORE_C7_RESIDENCY, &c->c7))
-                       return -8;
-               else if (t->is_atom) {
-                       /*
-                        * For Atom CPUs that has core cstate deeper than c6,
-                        * MSR_CORE_C6_RESIDENCY returns residency of cc6 and deeper.
-                        * Minus CC7 (and deeper cstates) residency to get
-                        * accturate cc6 residency.
-                        */
-                       c->c6 -= c->c7;
-               }
-       }
+       snprintf(path, sizeof(path), path_format, subsys, event_name);
 
-       if (DO_BIC(BIC_Mod_c6))
-               if (get_msr(cpu, MSR_MODULE_C6_RES_MS, &c->mc6_us))
-                       return -8;
+       read_perf_counter_info(path, format, &unit_buffer);
+       if (strcmp("Joules", unit_buffer) == 0)
+               return RAPL_UNIT_JOULES;
 
-       if (DO_BIC(BIC_CoreTmp)) {
-               if (get_msr(cpu, MSR_IA32_THERM_STATUS, &msr))
-                       return -9;
-               c->core_temp_c = tj_max - ((msr >> 16) & 0x7F);
-       }
+       return RAPL_UNIT_INVALID;
+}
 
-       if (DO_BIC(BIC_CORE_THROT_CNT))
-               get_core_throt_cnt(cpu, &c->core_throt_cnt);
+static double read_perf_rapl_scale(const char *subsys, const char *event_name)
+{
+       const char *const path_format = "/sys/bus/event_source/devices/%s/events/%s.scale";
+       const char *const format = "%lf";
+       char path[128];
+       double scale;
 
-       if (platform->rapl_msrs & RAPL_AMD_F17H) {
-               if (get_msr(cpu, MSR_CORE_ENERGY_STAT, &msr))
-                       return -14;
-               c->core_energy = msr & 0xFFFFFFFF;
-       }
+       snprintf(path, sizeof(path), path_format, subsys, event_name);
 
-       for (i = 0, mp = sys.cp; mp; i++, mp = mp->next) {
-               if (get_mp(cpu, mp, &c->counter[i]))
-                       return -10;
-       }
+       if (read_perf_counter_info(path, format, &scale))
+               return 0.0;
 
-       /* collect package counters only for 1st core in package */
-       if (!is_cpu_first_core_in_package(t, c, p))
-               goto done;
+       return scale;
+}
 
-       if (DO_BIC(BIC_Totl_c0)) {
-               if (get_msr(cpu, MSR_PKG_WEIGHTED_CORE_C0_RES, &p->pkg_wtd_core_c0))
-                       return -10;
-       }
-       if (DO_BIC(BIC_Any_c0)) {
-               if (get_msr(cpu, MSR_PKG_ANY_CORE_C0_RES, &p->pkg_any_core_c0))
-                       return -11;
-       }
-       if (DO_BIC(BIC_GFX_c0)) {
-               if (get_msr(cpu, MSR_PKG_ANY_GFXE_C0_RES, &p->pkg_any_gfxe_c0))
-                       return -12;
-       }
-       if (DO_BIC(BIC_CPUGFX)) {
-               if (get_msr(cpu, MSR_PKG_BOTH_CORE_GFXE_C0_RES, &p->pkg_both_core_gfxe_c0))
-                       return -13;
-       }
-       if (DO_BIC(BIC_Pkgpc3))
-               if (get_msr(cpu, MSR_PKG_C3_RESIDENCY, &p->pc3))
-                       return -9;
-       if (DO_BIC(BIC_Pkgpc6)) {
+static struct amperf_group_fd open_amperf_fd(int cpu)
+{
+       const unsigned int msr_type = read_msr_type();
+       const unsigned int aperf_config = read_aperf_config();
+       const unsigned int mperf_config = read_mperf_config();
+       struct amperf_group_fd fds = {.aperf = -1, .mperf = -1 };
+
+       fds.aperf = open_perf_counter(cpu, msr_type, aperf_config, -1, PERF_FORMAT_GROUP);
+       fds.mperf = open_perf_counter(cpu, msr_type, mperf_config, fds.aperf, PERF_FORMAT_GROUP);
+
+       return fds;
+}
+
+static int get_amperf_fd(int cpu)
+{
+       assert(fd_amperf_percpu);
+
+       if (fd_amperf_percpu[cpu].aperf)
+               return fd_amperf_percpu[cpu].aperf;
+
+       fd_amperf_percpu[cpu] = open_amperf_fd(cpu);
+
+       return fd_amperf_percpu[cpu].aperf;
+}
+
+/* Read APERF, MPERF and TSC using the perf API. */
+static int read_aperf_mperf_tsc_perf(struct thread_data *t, int cpu)
+{
+       union {
+               struct {
+                       unsigned long nr_entries;
+                       unsigned long aperf;
+                       unsigned long mperf;
+               };
+
+               unsigned long as_array[3];
+       } cnt;
+
+       const int fd_amperf = get_amperf_fd(cpu);
+
+       /*
+        * Read the TSC with rdtsc, because we want the absolute value and not
+        * the offset from the start of the counter.
+        */
+       t->tsc = rdtsc();
+
+       const int n = read(fd_amperf, &cnt.as_array[0], sizeof(cnt.as_array));
+
+       if (n != sizeof(cnt.as_array))
+               return -2;
+
+       t->aperf = cnt.aperf * aperf_mperf_multiplier;
+       t->mperf = cnt.mperf * aperf_mperf_multiplier;
+
+       return 0;
+}
+
+/* Read APERF, MPERF and TSC using the MSR driver and rdtsc instruction. */
+static int read_aperf_mperf_tsc_msr(struct thread_data *t, int cpu)
+{
+       unsigned long long tsc_before, tsc_between, tsc_after, aperf_time, mperf_time;
+       int aperf_mperf_retry_count = 0;
+
+       /*
+        * The TSC, APERF and MPERF must be read together for
+        * APERF/MPERF and MPERF/TSC to give accurate results.
+        *
+        * Unfortunately, APERF and MPERF are read by
+        * individual system call, so delays may occur
+        * between them.  If the time to read them
+        * varies by a large amount, we re-read them.
+        */
+
+       /*
+        * This initial dummy APERF read has been seen to
+        * reduce jitter in the subsequent reads.
+        */
+
+       if (get_msr(cpu, MSR_IA32_APERF, &t->aperf))
+               return -3;
+
+retry:
+       t->tsc = rdtsc();       /* re-read close to APERF */
+
+       tsc_before = t->tsc;
+
+       if (get_msr(cpu, MSR_IA32_APERF, &t->aperf))
+               return -3;
+
+       tsc_between = rdtsc();
+
+       if (get_msr(cpu, MSR_IA32_MPERF, &t->mperf))
+               return -4;
+
+       tsc_after = rdtsc();
+
+       aperf_time = tsc_between - tsc_before;
+       mperf_time = tsc_after - tsc_between;
+
+       /*
+        * If the system call latency to read APERF and MPERF
+        * differ by more than 2x, then try again.
+        */
+       if ((aperf_time > (2 * mperf_time)) || (mperf_time > (2 * aperf_time))) {
+               aperf_mperf_retry_count++;
+               if (aperf_mperf_retry_count < 5)
+                       goto retry;
+               else
+                       warnx("cpu%d jitter %lld %lld", cpu, aperf_time, mperf_time);
+       }
+       aperf_mperf_retry_count = 0;
+
+       t->aperf = t->aperf * aperf_mperf_multiplier;
+       t->mperf = t->mperf * aperf_mperf_multiplier;
+
+       return 0;
+}
+
+size_t rapl_counter_info_count_perf(const struct rapl_counter_info_t *rci)
+{
+       size_t ret = 0;
+
+       for (int i = 0; i < NUM_RAPL_COUNTERS; ++i)
+               if (rci->source[i] == RAPL_SOURCE_PERF)
+                       ++ret;
+
+       return ret;
+}
+
+void write_rapl_counter(struct rapl_counter *rc, struct rapl_counter_info_t *rci, unsigned int idx)
+{
+       rc->raw_value = rci->data[idx];
+       rc->unit = rci->unit[idx];
+       rc->scale = rci->scale[idx];
+}
+
+int get_rapl_counters(int cpu, int domain, struct core_data *c, struct pkg_data *p)
+{
+       unsigned long long perf_data[NUM_RAPL_COUNTERS + 1];
+       struct rapl_counter_info_t *rci = &rapl_counter_info_perdomain[domain];
+
+       if (debug)
+               fprintf(stderr, "%s: cpu%d domain%d\n", __func__, cpu, domain);
+
+       assert(rapl_counter_info_perdomain);
+
+       /*
+        * If we have any perf counters to read, read them all now, in bulk
+        */
+       if (rci->fd_perf != -1) {
+               size_t num_perf_counters = rapl_counter_info_count_perf(rci);
+               const ssize_t expected_read_size = (num_perf_counters + 1) * sizeof(unsigned long long);
+               const ssize_t actual_read_size = read(rci->fd_perf, &perf_data[0], sizeof(perf_data));
+
+               if (actual_read_size != expected_read_size)
+                       err(-1, "%s: failed to read perf_data (%zu %zu)", __func__, expected_read_size,
+                           actual_read_size);
+       }
+
+       for (unsigned int i = 0, pi = 1; i < NUM_RAPL_COUNTERS; ++i) {
+               switch (rci->source[i]) {
+               case RAPL_SOURCE_NONE:
+                       break;
+
+               case RAPL_SOURCE_PERF:
+                       assert(pi < ARRAY_SIZE(perf_data));
+                       assert(rci->fd_perf != -1);
+
+                       if (debug)
+                               fprintf(stderr, "Reading rapl counter via perf at %u (%llu %e %lf)\n",
+                                       i, perf_data[pi], rci->scale[i], perf_data[pi] * rci->scale[i]);
+
+                       rci->data[i] = perf_data[pi];
+
+                       ++pi;
+                       break;
+
+               case RAPL_SOURCE_MSR:
+                       if (debug)
+                               fprintf(stderr, "Reading rapl counter via msr at %u\n", i);
+
+                       assert(!no_msr);
+                       if (rci->flags[i] & RAPL_COUNTER_FLAG_USE_MSR_SUM) {
+                               if (get_msr_sum(cpu, rci->msr[i], &rci->data[i]))
+                                       return -13 - i;
+                       } else {
+                               if (get_msr(cpu, rci->msr[i], &rci->data[i]))
+                                       return -13 - i;
+                       }
+
+                       rci->data[i] &= rci->msr_mask[i];
+                       if (rci->msr_shift[i] >= 0)
+                               rci->data[i] >>= abs(rci->msr_shift[i]);
+                       else
+                               rci->data[i] <<= abs(rci->msr_shift[i]);
+
+                       break;
+               }
+       }
+
+       _Static_assert(NUM_RAPL_COUNTERS == 7);
+       write_rapl_counter(&p->energy_pkg, rci, RAPL_RCI_INDEX_ENERGY_PKG);
+       write_rapl_counter(&p->energy_cores, rci, RAPL_RCI_INDEX_ENERGY_CORES);
+       write_rapl_counter(&p->energy_dram, rci, RAPL_RCI_INDEX_DRAM);
+       write_rapl_counter(&p->energy_gfx, rci, RAPL_RCI_INDEX_GFX);
+       write_rapl_counter(&p->rapl_pkg_perf_status, rci, RAPL_RCI_INDEX_PKG_PERF_STATUS);
+       write_rapl_counter(&p->rapl_dram_perf_status, rci, RAPL_RCI_INDEX_DRAM_PERF_STATUS);
+       write_rapl_counter(&c->core_energy, rci, RAPL_RCI_INDEX_CORE_ENERGY);
+
+       return 0;
+}
+
+/*
+ * get_counters(...)
+ * migrate to cpu
+ * acquire and record local counters for that cpu
+ */
+int get_counters(struct thread_data *t, struct core_data *c, struct pkg_data *p)
+{
+       int cpu = t->cpu_id;
+       unsigned long long msr;
+       struct msr_counter *mp;
+       int i;
+       int status;
+
+       if (cpu_migrate(cpu)) {
+               fprintf(outf, "%s: Could not migrate to CPU %d\n", __func__, cpu);
+               return -1;
+       }
+
+       gettimeofday(&t->tv_begin, (struct timezone *)NULL);
+
+       if (first_counter_read)
+               get_apic_id(t);
+
+       t->tsc = rdtsc();       /* we are running on local CPU of interest */
+
+       if (DO_BIC(BIC_Avg_MHz) || DO_BIC(BIC_Busy) || DO_BIC(BIC_Bzy_MHz) || DO_BIC(BIC_IPC)
+           || soft_c1_residency_display(BIC_Avg_MHz)) {
+               int status = -1;
+
+               assert(!no_perf || !no_msr);
+
+               switch (amperf_source) {
+               case AMPERF_SOURCE_PERF:
+                       status = read_aperf_mperf_tsc_perf(t, cpu);
+                       break;
+               case AMPERF_SOURCE_MSR:
+                       status = read_aperf_mperf_tsc_msr(t, cpu);
+                       break;
+               }
+
+               if (status != 0)
+                       return status;
+       }
+
+       if (DO_BIC(BIC_IPC))
+               if (read(get_instr_count_fd(cpu), &t->instr_count, sizeof(long long)) != sizeof(long long))
+                       return -4;
+
+       if (DO_BIC(BIC_IRQ))
+               t->irq_count = irqs_per_cpu[cpu];
+       if (DO_BIC(BIC_SMI)) {
+               if (get_msr(cpu, MSR_SMI_COUNT, &msr))
+                       return -5;
+               t->smi_count = msr & 0xFFFFFFFF;
+       }
+       if (DO_BIC(BIC_CPU_c1) && platform->has_msr_core_c1_res) {
+               if (get_msr(cpu, MSR_CORE_C1_RES, &t->c1))
+                       return -6;
+       }
+
+       for (i = 0, mp = sys.tp; mp; i++, mp = mp->next) {
+               if (get_mp(cpu, mp, &t->counter[i]))
+                       return -10;
+       }
+
+       /* collect core counters only for 1st thread in core */
+       if (!is_cpu_first_thread_in_core(t, c, p))
+               goto done;
+
+       if (platform->has_per_core_rapl) {
+               status = get_rapl_counters(cpu, c->core_id, c, p);
+               if (status != 0)
+                       return status;
+       }
+
+       if (DO_BIC(BIC_CPU_c3) || soft_c1_residency_display(BIC_CPU_c3)) {
+               if (get_msr(cpu, MSR_CORE_C3_RESIDENCY, &c->c3))
+                       return -6;
+       }
+
+       if ((DO_BIC(BIC_CPU_c6) || soft_c1_residency_display(BIC_CPU_c6)) && !platform->has_msr_knl_core_c6_residency) {
+               if (get_msr(cpu, MSR_CORE_C6_RESIDENCY, &c->c6))
+                       return -7;
+       } else if (platform->has_msr_knl_core_c6_residency && soft_c1_residency_display(BIC_CPU_c6)) {
+               if (get_msr(cpu, MSR_KNL_CORE_C6_RESIDENCY, &c->c6))
+                       return -7;
+       }
+
+       if (DO_BIC(BIC_CPU_c7) || soft_c1_residency_display(BIC_CPU_c7)) {
+               if (get_msr(cpu, MSR_CORE_C7_RESIDENCY, &c->c7))
+                       return -8;
+               else if (t->is_atom) {
+                       /*
+                        * For Atom CPUs that has core cstate deeper than c6,
+                        * MSR_CORE_C6_RESIDENCY returns residency of cc6 and deeper.
+                        * Minus CC7 (and deeper cstates) residency to get
+                        * accturate cc6 residency.
+                        */
+                       c->c6 -= c->c7;
+               }
+       }
+
+       if (DO_BIC(BIC_Mod_c6))
+               if (get_msr(cpu, MSR_MODULE_C6_RES_MS, &c->mc6_us))
+                       return -8;
+
+       if (DO_BIC(BIC_CoreTmp)) {
+               if (get_msr(cpu, MSR_IA32_THERM_STATUS, &msr))
+                       return -9;
+               c->core_temp_c = tj_max - ((msr >> 16) & 0x7F);
+       }
+
+       if (DO_BIC(BIC_CORE_THROT_CNT))
+               get_core_throt_cnt(cpu, &c->core_throt_cnt);
+
+       for (i = 0, mp = sys.cp; mp; i++, mp = mp->next) {
+               if (get_mp(cpu, mp, &c->counter[i]))
+                       return -10;
+       }
+
+       /* collect package counters only for 1st core in package */
+       if (!is_cpu_first_core_in_package(t, c, p))
+               goto done;
+
+       if (DO_BIC(BIC_Totl_c0)) {
+               if (get_msr(cpu, MSR_PKG_WEIGHTED_CORE_C0_RES, &p->pkg_wtd_core_c0))
+                       return -10;
+       }
+       if (DO_BIC(BIC_Any_c0)) {
+               if (get_msr(cpu, MSR_PKG_ANY_CORE_C0_RES, &p->pkg_any_core_c0))
+                       return -11;
+       }
+       if (DO_BIC(BIC_GFX_c0)) {
+               if (get_msr(cpu, MSR_PKG_ANY_GFXE_C0_RES, &p->pkg_any_gfxe_c0))
+                       return -12;
+       }
+       if (DO_BIC(BIC_CPUGFX)) {
+               if (get_msr(cpu, MSR_PKG_BOTH_CORE_GFXE_C0_RES, &p->pkg_both_core_gfxe_c0))
+                       return -13;
+       }
+       if (DO_BIC(BIC_Pkgpc3))
+               if (get_msr(cpu, MSR_PKG_C3_RESIDENCY, &p->pc3))
+                       return -9;
+       if (DO_BIC(BIC_Pkgpc6)) {
                if (platform->has_msr_atom_pkg_c6_residency) {
                        if (get_msr(cpu, MSR_ATOM_PKG_C6_RESIDENCY, &p->pc6))
                                return -10;
@@ -2911,59 +3618,39 @@ retry:
        if (DO_BIC(BIC_SYS_LPI))
                p->sys_lpi = cpuidle_cur_sys_lpi_us;
 
-       if (platform->rapl_msrs & RAPL_PKG) {
-               if (get_msr_sum(cpu, MSR_PKG_ENERGY_STATUS, &msr))
-                       return -13;
-               p->energy_pkg = msr;
-       }
-       if (platform->rapl_msrs & RAPL_CORE_ENERGY_STATUS) {
-               if (get_msr_sum(cpu, MSR_PP0_ENERGY_STATUS, &msr))
-                       return -14;
-               p->energy_cores = msr;
-       }
-       if (platform->rapl_msrs & RAPL_DRAM) {
-               if (get_msr_sum(cpu, MSR_DRAM_ENERGY_STATUS, &msr))
-                       return -15;
-               p->energy_dram = msr;
-       }
-       if (platform->rapl_msrs & RAPL_GFX) {
-               if (get_msr_sum(cpu, MSR_PP1_ENERGY_STATUS, &msr))
-                       return -16;
-               p->energy_gfx = msr;
-       }
-       if (platform->rapl_msrs & RAPL_PKG_PERF_STATUS) {
-               if (get_msr_sum(cpu, MSR_PKG_PERF_STATUS, &msr))
-                       return -16;
-               p->rapl_pkg_perf_status = msr;
-       }
-       if (platform->rapl_msrs & RAPL_DRAM_PERF_STATUS) {
-               if (get_msr_sum(cpu, MSR_DRAM_PERF_STATUS, &msr))
-                       return -16;
-               p->rapl_dram_perf_status = msr;
-       }
-       if (platform->rapl_msrs & RAPL_AMD_F17H) {
-               if (get_msr_sum(cpu, MSR_PKG_ENERGY_STAT, &msr))
-                       return -13;
-               p->energy_pkg = msr;
+       if (!platform->has_per_core_rapl) {
+               status = get_rapl_counters(cpu, p->package_id, c, p);
+               if (status != 0)
+                       return status;
        }
+
        if (DO_BIC(BIC_PkgTmp)) {
                if (get_msr(cpu, MSR_IA32_PACKAGE_THERM_STATUS, &msr))
                        return -17;
                p->pkg_temp_c = tj_max - ((msr >> 16) & 0x7F);
        }
 
-       if (DO_BIC(BIC_GFX_rc6))
-               p->gfx_rc6_ms = gfx_cur_rc6_ms;
-
        /* n.b. assume die0 uncore frequency applies to whole package */
        if (DO_BIC(BIC_UNCORE_MHZ))
                p->uncore_mhz = get_uncore_mhz(p->package_id, 0);
 
+       if (DO_BIC(BIC_GFX_rc6))
+               p->gfx_rc6_ms = gfx_info[GFX_rc6].val_ull;
+
        if (DO_BIC(BIC_GFXMHz))
-               p->gfx_mhz = gfx_cur_mhz;
+               p->gfx_mhz = gfx_info[GFX_MHz].val;
 
        if (DO_BIC(BIC_GFXACTMHz))
-               p->gfx_act_mhz = gfx_act_mhz;
+               p->gfx_act_mhz = gfx_info[GFX_ACTMHz].val;
+
+       if (DO_BIC(BIC_SAM_mc6))
+               p->sam_mc6_ms = gfx_info[SAM_mc6].val_ull;
+
+       if (DO_BIC(BIC_SAMMHz))
+               p->sam_mhz = gfx_info[SAM_MHz].val;
+
+       if (DO_BIC(BIC_SAMACTMHz))
+               p->sam_act_mhz = gfx_info[SAM_ACTMHz].val;
 
        for (i = 0, mp = sys.pp; mp; i++, mp = mp->next) {
                if (get_mp(cpu, mp, &p->counter[i]))
@@ -3053,7 +3740,7 @@ void probe_cst_limit(void)
        unsigned long long msr;
        int *pkg_cstate_limits;
 
-       if (!platform->has_nhm_msrs)
+       if (!platform->has_nhm_msrs || no_msr)
                return;
 
        switch (platform->cst_limit) {
@@ -3097,7 +3784,7 @@ static void dump_platform_info(void)
        unsigned long long msr;
        unsigned int ratio;
 
-       if (!platform->has_nhm_msrs)
+       if (!platform->has_nhm_msrs || no_msr)
                return;
 
        get_msr(base_cpu, MSR_PLATFORM_INFO, &msr);
@@ -3115,7 +3802,7 @@ static void dump_power_ctl(void)
 {
        unsigned long long msr;
 
-       if (!platform->has_nhm_msrs)
+       if (!platform->has_nhm_msrs || no_msr)
                return;
 
        get_msr(base_cpu, MSR_IA32_POWER_CTL, &msr);
@@ -3321,7 +4008,7 @@ static void dump_cst_cfg(void)
 {
        unsigned long long msr;
 
-       if (!platform->has_nhm_msrs)
+       if (!platform->has_nhm_msrs || no_msr)
                return;
 
        get_msr(base_cpu, MSR_PKG_CST_CONFIG_CONTROL, &msr);
@@ -3393,7 +4080,7 @@ void print_irtl(void)
 {
        unsigned long long msr;
 
-       if (!platform->has_irtl_msrs)
+       if (!platform->has_irtl_msrs || no_msr)
                return;
 
        if (platform->supported_cstates & PC3) {
@@ -3443,12 +4130,64 @@ void free_fd_percpu(void)
 {
        int i;
 
+       if (!fd_percpu)
+               return;
+
        for (i = 0; i < topo.max_cpu_num + 1; ++i) {
                if (fd_percpu[i] != 0)
                        close(fd_percpu[i]);
        }
 
        free(fd_percpu);
+       fd_percpu = NULL;
+}
+
+void free_fd_amperf_percpu(void)
+{
+       int i;
+
+       if (!fd_amperf_percpu)
+               return;
+
+       for (i = 0; i < topo.max_cpu_num + 1; ++i) {
+               if (fd_amperf_percpu[i].mperf != 0)
+                       close(fd_amperf_percpu[i].mperf);
+
+               if (fd_amperf_percpu[i].aperf != 0)
+                       close(fd_amperf_percpu[i].aperf);
+       }
+
+       free(fd_amperf_percpu);
+       fd_amperf_percpu = NULL;
+}
+
+void free_fd_instr_count_percpu(void)
+{
+       if (!fd_instr_count_percpu)
+               return;
+
+       for (int i = 0; i < topo.max_cpu_num + 1; ++i) {
+               if (fd_instr_count_percpu[i] != 0)
+                       close(fd_instr_count_percpu[i]);
+       }
+
+       free(fd_instr_count_percpu);
+       fd_instr_count_percpu = NULL;
+}
+
+void free_fd_rapl_percpu(void)
+{
+       if (!rapl_counter_info_perdomain)
+               return;
+
+       const int num_domains = platform->has_per_core_rapl ? topo.num_cores : topo.num_packages;
+
+       for (int domain_id = 0; domain_id < num_domains; ++domain_id) {
+               if (rapl_counter_info_perdomain[domain_id].fd_perf != -1)
+                       close(rapl_counter_info_perdomain[domain_id].fd_perf);
+       }
+
+       free(rapl_counter_info_perdomain);
 }
 
 void free_all_buffers(void)
@@ -3492,6 +4231,9 @@ void free_all_buffers(void)
        outp = NULL;
 
        free_fd_percpu();
+       free_fd_instr_count_percpu();
+       free_fd_amperf_percpu();
+       free_fd_rapl_percpu();
 
        free(irq_column_2_cpu);
        free(irqs_per_cpu);
@@ -3825,11 +4567,17 @@ static void update_effective_set(bool startup)
                err(1, "%s: cpu str malformat %s\n", PATH_EFFECTIVE_CPUS, cpu_effective_str);
 }
 
+void linux_perf_init(void);
+void rapl_perf_init(void);
+
 void re_initialize(void)
 {
        free_all_buffers();
        setup_all_buffers(false);
-       fprintf(outf, "turbostat: re-initialized with num_cpus %d, allowed_cpus %d\n", topo.num_cpus, topo.allowed_cpus);
+       linux_perf_init();
+       rapl_perf_init();
+       fprintf(outf, "turbostat: re-initialized with num_cpus %d, allowed_cpus %d\n", topo.num_cpus,
+               topo.allowed_cpus);
 }
 
 void set_max_cpu_num(void)
@@ -3940,85 +4688,43 @@ int snapshot_proc_interrupts(void)
 }
 
 /*
- * snapshot_gfx_rc6_ms()
+ * snapshot_graphics()
  *
- * record snapshot of
- * /sys/class/drm/card0/power/rc6_residency_ms
+ * record snapshot of specified graphics sysfs knob
  *
  * return 1 if config change requires a restart, else return 0
  */
-int snapshot_gfx_rc6_ms(void)
+int snapshot_graphics(int idx)
 {
        FILE *fp;
        int retval;
 
-       fp = fopen_or_die("/sys/class/drm/card0/power/rc6_residency_ms", "r");
-
-       retval = fscanf(fp, "%lld", &gfx_cur_rc6_ms);
-       if (retval != 1)
-               err(1, "GFX rc6");
-
-       fclose(fp);
-
-       return 0;
-}
-
-/*
- * snapshot_gfx_mhz()
- *
- * fall back to /sys/class/graphics/fb0/device/drm/card0/gt_cur_freq_mhz
- * when /sys/class/drm/card0/gt_cur_freq_mhz is not available.
- *
- * return 1 if config change requires a restart, else return 0
- */
-int snapshot_gfx_mhz(void)
-{
-       static FILE *fp;
-       int retval;
-
-       if (fp == NULL) {
-               fp = fopen("/sys/class/drm/card0/gt_cur_freq_mhz", "r");
-               if (!fp)
-                       fp = fopen_or_die("/sys/class/graphics/fb0/device/drm/card0/gt_cur_freq_mhz", "r");
-       } else {
-               rewind(fp);
-               fflush(fp);
-       }
-
-       retval = fscanf(fp, "%d", &gfx_cur_mhz);
-       if (retval != 1)
-               err(1, "GFX MHz");
-
-       return 0;
-}
-
-/*
- * snapshot_gfx_cur_mhz()
- *
- * fall back to /sys/class/graphics/fb0/device/drm/card0/gt_act_freq_mhz
- * when /sys/class/drm/card0/gt_act_freq_mhz is not available.
- *
- * return 1 if config change requires a restart, else return 0
- */
-int snapshot_gfx_act_mhz(void)
-{
-       static FILE *fp;
-       int retval;
-
-       if (fp == NULL) {
-               fp = fopen("/sys/class/drm/card0/gt_act_freq_mhz", "r");
-               if (!fp)
-                       fp = fopen_or_die("/sys/class/graphics/fb0/device/drm/card0/gt_act_freq_mhz", "r");
-       } else {
-               rewind(fp);
-               fflush(fp);
+       switch (idx) {
+       case GFX_rc6:
+       case SAM_mc6:
+               fp = fopen_or_die(gfx_info[idx].path, "r");
+               retval = fscanf(fp, "%lld", &gfx_info[idx].val_ull);
+               if (retval != 1)
+                       err(1, "rc6");
+               fclose(fp);
+               return 0;
+       case GFX_MHz:
+       case GFX_ACTMHz:
+       case SAM_MHz:
+       case SAM_ACTMHz:
+               if (gfx_info[idx].fp == NULL) {
+                       gfx_info[idx].fp = fopen_or_die(gfx_info[idx].path, "r");
+               } else {
+                       rewind(gfx_info[idx].fp);
+                       fflush(gfx_info[idx].fp);
+               }
+               retval = fscanf(gfx_info[idx].fp, "%d", &gfx_info[idx].val);
+               if (retval != 1)
+                       err(1, "MHz");
+               return 0;
+       default:
+               return -EINVAL;
        }
-
-       retval = fscanf(fp, "%d", &gfx_act_mhz);
-       if (retval != 1)
-               err(1, "GFX ACT MHz");
-
-       return 0;
 }
 
 /*
@@ -4083,13 +4789,22 @@ int snapshot_proc_sysfs_files(void)
                        return 1;
 
        if (DO_BIC(BIC_GFX_rc6))
-               snapshot_gfx_rc6_ms();
+               snapshot_graphics(GFX_rc6);
 
        if (DO_BIC(BIC_GFXMHz))
-               snapshot_gfx_mhz();
+               snapshot_graphics(GFX_MHz);
 
        if (DO_BIC(BIC_GFXACTMHz))
-               snapshot_gfx_act_mhz();
+               snapshot_graphics(GFX_ACTMHz);
+
+       if (DO_BIC(BIC_SAM_mc6))
+               snapshot_graphics(SAM_mc6);
+
+       if (DO_BIC(BIC_SAMMHz))
+               snapshot_graphics(SAM_MHz);
+
+       if (DO_BIC(BIC_SAMACTMHz))
+               snapshot_graphics(SAM_ACTMHz);
 
        if (DO_BIC(BIC_CPU_LPI))
                snapshot_cpu_lpi_us();
@@ -4173,6 +4888,8 @@ int get_msr_sum(int cpu, off_t offset, unsigned long long *msr)
        int ret, idx;
        unsigned long long msr_cur, msr_last;
 
+       assert(!no_msr);
+
        if (!per_cpu_msr_sum)
                return 1;
 
@@ -4201,6 +4918,8 @@ static int update_msr_sum(struct thread_data *t, struct core_data *c, struct pkg
        UNUSED(c);
        UNUSED(p);
 
+       assert(!no_msr);
+
        for (i = IDX_PKG_ENERGY; i < IDX_COUNT; i++) {
                unsigned long long msr_cur, msr_last;
                off_t offset;
@@ -4280,7 +4999,8 @@ release_msr:
 
 /*
  * set_my_sched_priority(pri)
- * return previous
+ * return previous priority on success
+ * return value < -20 on failure
  */
 int set_my_sched_priority(int priority)
 {
@@ -4290,16 +5010,16 @@ int set_my_sched_priority(int priority)
        errno = 0;
        original_priority = getpriority(PRIO_PROCESS, 0);
        if (errno && (original_priority == -1))
-               err(errno, "getpriority");
+               return -21;
 
        retval = setpriority(PRIO_PROCESS, 0, priority);
        if (retval)
-               errx(retval, "capget(CAP_SYS_NICE) failed,try \"# setcap cap_sys_nice=ep %s\"", progname);
+               return -21;
 
        errno = 0;
        retval = getpriority(PRIO_PROCESS, 0);
        if (retval != priority)
-               err(retval, "getpriority(%d) != setpriority(%d)", retval, priority);
+               return -21;
 
        return original_priority;
 }
@@ -4314,6 +5034,9 @@ void turbostat_loop()
 
        /*
         * elevate own priority for interval mode
+        *
+        * ignore on error - we probably don't have permission to set it, but
+        * it's not a big deal
         */
        set_my_sched_priority(-20);
 
@@ -4399,10 +5122,13 @@ void check_dev_msr()
        struct stat sb;
        char pathname[32];
 
+       if (no_msr)
+               return;
+
        sprintf(pathname, "/dev/cpu/%d/msr", base_cpu);
        if (stat(pathname, &sb))
                if (system("/sbin/modprobe msr > /dev/null 2>&1"))
-                       err(-5, "no /dev/cpu/0/msr, Try \"# modprobe msr\" ");
+                       no_msr = 1;
 }
 
 /*
@@ -4414,47 +5140,51 @@ int check_for_cap_sys_rawio(void)
 {
        cap_t caps;
        cap_flag_value_t cap_flag_value;
+       int ret = 0;
 
        caps = cap_get_proc();
        if (caps == NULL)
-               err(-6, "cap_get_proc\n");
+               return 1;
 
-       if (cap_get_flag(caps, CAP_SYS_RAWIO, CAP_EFFECTIVE, &cap_flag_value))
-               err(-6, "cap_get\n");
+       if (cap_get_flag(caps, CAP_SYS_RAWIO, CAP_EFFECTIVE, &cap_flag_value)) {
+               ret = 1;
+               goto free_and_exit;
+       }
 
        if (cap_flag_value != CAP_SET) {
-               warnx("capget(CAP_SYS_RAWIO) failed," " try \"# setcap cap_sys_rawio=ep %s\"", progname);
-               return 1;
+               ret = 1;
+               goto free_and_exit;
        }
 
+free_and_exit:
        if (cap_free(caps) == -1)
                err(-6, "cap_free\n");
 
-       return 0;
+       return ret;
 }
 
-void check_permissions(void)
+void check_msr_permission(void)
 {
-       int do_exit = 0;
+       int failed = 0;
        char pathname[32];
 
+       if (no_msr)
+               return;
+
        /* check for CAP_SYS_RAWIO */
-       do_exit += check_for_cap_sys_rawio();
+       failed += check_for_cap_sys_rawio();
 
        /* test file permissions */
        sprintf(pathname, "/dev/cpu/%d/msr", base_cpu);
        if (euidaccess(pathname, R_OK)) {
-               do_exit++;
-               warn("/dev/cpu/0/msr open failed, try chown or chmod +r /dev/cpu/*/msr");
+               failed++;
        }
 
-       /* if all else fails, thell them to be root */
-       if (do_exit)
-               if (getuid() != 0)
-                       warnx("... or simply run as root");
-
-       if (do_exit)
-               exit(-6);
+       if (failed) {
+               warnx("Failed to access %s. Some of the counters may not be available\n"
+                     "\tRun as root to enable them or use %s to disable the access explicitly", pathname, "--no-msr");
+               no_msr = 1;
+       }
 }
 
 void probe_bclk(void)
@@ -4462,7 +5192,7 @@ void probe_bclk(void)
        unsigned long long msr;
        unsigned int base_ratio;
 
-       if (!platform->has_nhm_msrs)
+       if (!platform->has_nhm_msrs || no_msr)
                return;
 
        if (platform->bclk_freq == BCLK_100MHZ)
@@ -4502,7 +5232,7 @@ static void dump_turbo_ratio_info(void)
        if (!has_turbo)
                return;
 
-       if (!platform->has_nhm_msrs)
+       if (!platform->has_nhm_msrs || no_msr)
                return;
 
        if (platform->trl_msrs & TRL_LIMIT2)
@@ -4567,20 +5297,15 @@ static void dump_sysfs_file(char *path)
 static void probe_intel_uncore_frequency(void)
 {
        int i, j;
-       char path[128];
+       char path[256];
 
        if (!genuine_intel)
                return;
 
-       if (access("/sys/devices/system/cpu/intel_uncore_frequency/package_00_die_00", R_OK))
-               return;
-
-       /* Cluster level sysfs not supported yet. */
-       if (!access("/sys/devices/system/cpu/intel_uncore_frequency/uncore00", R_OK))
-               return;
+       if (access("/sys/devices/system/cpu/intel_uncore_frequency/package_00_die_00/current_freq_khz", R_OK))
+               goto probe_cluster;
 
-       if (!access("/sys/devices/system/cpu/intel_uncore_frequency/package_00_die_00/current_freq_khz", R_OK))
-               BIC_PRESENT(BIC_UNCORE_MHZ);
+       BIC_PRESENT(BIC_UNCORE_MHZ);
 
        if (quiet)
                return;
@@ -4588,40 +5313,178 @@ static void probe_intel_uncore_frequency(void)
        for (i = 0; i < topo.num_packages; ++i) {
                for (j = 0; j < topo.num_die; ++j) {
                        int k, l;
+                       char path_base[128];
 
-                       sprintf(path, "/sys/devices/system/cpu/intel_uncore_frequency/package_0%d_die_0%d/min_freq_khz",
-                               i, j);
+                       sprintf(path_base, "/sys/devices/system/cpu/intel_uncore_frequency/package_%02d_die_%02d", i,
+                               j);
+
+                       sprintf(path, "%s/min_freq_khz", path_base);
                        k = read_sysfs_int(path);
-                       sprintf(path, "/sys/devices/system/cpu/intel_uncore_frequency/package_0%d_die_0%d/max_freq_khz",
-                               i, j);
+                       sprintf(path, "%s/max_freq_khz", path_base);
                        l = read_sysfs_int(path);
-                       fprintf(outf, "Uncore Frequency pkg%d die%d: %d - %d MHz ", i, j, k / 1000, l / 1000);
+                       fprintf(outf, "Uncore Frequency package%d die%d: %d - %d MHz ", i, j, k / 1000, l / 1000);
 
-                       sprintf(path,
-                               "/sys/devices/system/cpu/intel_uncore_frequency/package_0%d_die_0%d/initial_min_freq_khz",
-                               i, j);
+                       sprintf(path, "%s/initial_min_freq_khz", path_base);
                        k = read_sysfs_int(path);
-                       sprintf(path,
-                               "/sys/devices/system/cpu/intel_uncore_frequency/package_0%d_die_0%d/initial_max_freq_khz",
-                               i, j);
+                       sprintf(path, "%s/initial_max_freq_khz", path_base);
                        l = read_sysfs_int(path);
-                       fprintf(outf, "(%d - %d MHz)\n", k / 1000, l / 1000);
+                       fprintf(outf, "(%d - %d MHz)", k / 1000, l / 1000);
+
+                       sprintf(path, "%s/current_freq_khz", path_base);
+                       k = read_sysfs_int(path);
+                       fprintf(outf, " %d MHz\n", k / 1000);
                }
        }
+       return;
+
+probe_cluster:
+       if (access("/sys/devices/system/cpu/intel_uncore_frequency/uncore00/current_freq_khz", R_OK))
+               return;
+
+       if (quiet)
+               return;
+
+       for (i = 0;; ++i) {
+               int k, l;
+               char path_base[128];
+               int package_id, domain_id, cluster_id;
+
+               sprintf(path_base, "/sys/devices/system/cpu/intel_uncore_frequency/uncore%02d", i);
+
+               if (access(path_base, R_OK))
+                       break;
+
+               sprintf(path, "%s/package_id", path_base);
+               package_id = read_sysfs_int(path);
+
+               sprintf(path, "%s/domain_id", path_base);
+               domain_id = read_sysfs_int(path);
+
+               sprintf(path, "%s/fabric_cluster_id", path_base);
+               cluster_id = read_sysfs_int(path);
+
+               sprintf(path, "%s/min_freq_khz", path_base);
+               k = read_sysfs_int(path);
+               sprintf(path, "%s/max_freq_khz", path_base);
+               l = read_sysfs_int(path);
+               fprintf(outf, "Uncore Frequency package%d domain%d cluster%d: %d - %d MHz ", package_id, domain_id,
+                       cluster_id, k / 1000, l / 1000);
+
+               sprintf(path, "%s/initial_min_freq_khz", path_base);
+               k = read_sysfs_int(path);
+               sprintf(path, "%s/initial_max_freq_khz", path_base);
+               l = read_sysfs_int(path);
+               fprintf(outf, "(%d - %d MHz)", k / 1000, l / 1000);
+
+               sprintf(path, "%s/current_freq_khz", path_base);
+               k = read_sysfs_int(path);
+               fprintf(outf, " %d MHz\n", k / 1000);
+       }
 }
 
 static void probe_graphics(void)
 {
+       /* Xe graphics sysfs knobs */
+       if (!access("/sys/class/drm/card0/device/tile0/gt0/gtidle/idle_residency_ms", R_OK)) {
+               FILE *fp;
+               char buf[8];
+               bool gt0_is_gt;
+               int idx;
+
+               fp = fopen("/sys/class/drm/card0/device/tile0/gt0/gtidle/name", "r");
+               if (!fp)
+                       goto next;
+
+               if (!fread(buf, sizeof(char), 7, fp)) {
+                       fclose(fp);
+                       goto next;
+               }
+               fclose(fp);
+
+               if (!strncmp(buf, "gt0-rc", strlen("gt0-rc")))
+                       gt0_is_gt = true;
+               else if (!strncmp(buf, "gt0-mc", strlen("gt0-mc")))
+                       gt0_is_gt = false;
+               else
+                       goto next;
+
+               idx = gt0_is_gt ? GFX_rc6 : SAM_mc6;
+               gfx_info[idx].path = "/sys/class/drm/card0/device/tile0/gt0/gtidle/idle_residency_ms";
+
+               idx = gt0_is_gt ? GFX_MHz : SAM_MHz;
+               if (!access("/sys/class/drm/card0/device/tile0/gt0/freq0/cur_freq", R_OK))
+                       gfx_info[idx].path = "/sys/class/drm/card0/device/tile0/gt0/freq0/cur_freq";
+
+               idx = gt0_is_gt ? GFX_ACTMHz : SAM_ACTMHz;
+               if (!access("/sys/class/drm/card0/device/tile0/gt0/freq0/act_freq", R_OK))
+                       gfx_info[idx].path = "/sys/class/drm/card0/device/tile0/gt0/freq0/act_freq";
+
+               idx = gt0_is_gt ? SAM_mc6 : GFX_rc6;
+               if (!access("/sys/class/drm/card0/device/tile0/gt1/gtidle/idle_residency_ms", R_OK))
+                       gfx_info[idx].path = "/sys/class/drm/card0/device/tile0/gt1/gtidle/idle_residency_ms";
+
+               idx = gt0_is_gt ? SAM_MHz : GFX_MHz;
+               if (!access("/sys/class/drm/card0/device/tile0/gt1/freq0/cur_freq", R_OK))
+                       gfx_info[idx].path = "/sys/class/drm/card0/device/tile0/gt1/freq0/cur_freq";
+
+               idx = gt0_is_gt ? SAM_ACTMHz : GFX_ACTMHz;
+               if (!access("/sys/class/drm/card0/device/tile0/gt1/freq0/act_freq", R_OK))
+                       gfx_info[idx].path = "/sys/class/drm/card0/device/tile0/gt1/freq0/act_freq";
+
+               goto end;
+       }
+
+next:
+       /* New i915 graphics sysfs knobs */
+       if (!access("/sys/class/drm/card0/gt/gt0/rc6_residency_ms", R_OK)) {
+               gfx_info[GFX_rc6].path = "/sys/class/drm/card0/gt/gt0/rc6_residency_ms";
+
+               if (!access("/sys/class/drm/card0/gt/gt0/rps_cur_freq_mhz", R_OK))
+                       gfx_info[GFX_MHz].path = "/sys/class/drm/card0/gt/gt0/rps_cur_freq_mhz";
+
+               if (!access("/sys/class/drm/card0/gt/gt0/rps_act_freq_mhz", R_OK))
+                       gfx_info[GFX_ACTMHz].path = "/sys/class/drm/card0/gt/gt0/rps_act_freq_mhz";
+
+               if (!access("/sys/class/drm/card0/gt/gt1/rc6_residency_ms", R_OK))
+                       gfx_info[SAM_mc6].path = "/sys/class/drm/card0/gt/gt1/rc6_residency_ms";
+
+               if (!access("/sys/class/drm/card0/gt/gt1/rps_cur_freq_mhz", R_OK))
+                       gfx_info[SAM_MHz].path = "/sys/class/drm/card0/gt/gt1/rps_cur_freq_mhz";
+
+               if (!access("/sys/class/drm/card0/gt/gt1/rps_act_freq_mhz", R_OK))
+                       gfx_info[SAM_ACTMHz].path = "/sys/class/drm/card0/gt/gt1/rps_act_freq_mhz";
+
+               goto end;
+       }
+
+       /* Fall back to traditional i915 graphics sysfs knobs */
        if (!access("/sys/class/drm/card0/power/rc6_residency_ms", R_OK))
-               BIC_PRESENT(BIC_GFX_rc6);
+               gfx_info[GFX_rc6].path = "/sys/class/drm/card0/power/rc6_residency_ms";
+
+       if (!access("/sys/class/drm/card0/gt_cur_freq_mhz", R_OK))
+               gfx_info[GFX_MHz].path = "/sys/class/drm/card0/gt_cur_freq_mhz";
+       else if (!access("/sys/class/graphics/fb0/device/drm/card0/gt_cur_freq_mhz", R_OK))
+               gfx_info[GFX_MHz].path = "/sys/class/graphics/fb0/device/drm/card0/gt_cur_freq_mhz";
 
-       if (!access("/sys/class/drm/card0/gt_cur_freq_mhz", R_OK) ||
-           !access("/sys/class/graphics/fb0/device/drm/card0/gt_cur_freq_mhz", R_OK))
-               BIC_PRESENT(BIC_GFXMHz);
 
-       if (!access("/sys/class/drm/card0/gt_act_freq_mhz", R_OK) ||
-           !access("/sys/class/graphics/fb0/device/drm/card0/gt_act_freq_mhz", R_OK))
+       if (!access("/sys/class/drm/card0/gt_act_freq_mhz", R_OK))
+               gfx_info[GFX_ACTMHz].path = "/sys/class/drm/card0/gt_act_freq_mhz";
+       else if (!access("/sys/class/graphics/fb0/device/drm/card0/gt_act_freq_mhz", R_OK))
+               gfx_info[GFX_ACTMHz].path = "/sys/class/graphics/fb0/device/drm/card0/gt_act_freq_mhz";
+
+end:
+       if (gfx_info[GFX_rc6].path)
+               BIC_PRESENT(BIC_GFX_rc6);
+       if (gfx_info[GFX_MHz].path)
+               BIC_PRESENT(BIC_GFXMHz);
+       if (gfx_info[GFX_ACTMHz].path)
                BIC_PRESENT(BIC_GFXACTMHz);
+       if (gfx_info[SAM_mc6].path)
+               BIC_PRESENT(BIC_SAM_mc6);
+       if (gfx_info[SAM_MHz].path)
+               BIC_PRESENT(BIC_SAMMHz);
+       if (gfx_info[SAM_ACTMHz].path)
+               BIC_PRESENT(BIC_SAMACTMHz);
 }
 
 static void dump_sysfs_cstate_config(void)
@@ -4783,6 +5646,9 @@ int print_hwp(struct thread_data *t, struct core_data *c, struct pkg_data *p)
        UNUSED(c);
        UNUSED(p);
 
+       if (no_msr)
+               return 0;
+
        if (!has_hwp)
                return 0;
 
@@ -4869,6 +5735,9 @@ int print_perf_limit(struct thread_data *t, struct core_data *c, struct pkg_data
        UNUSED(c);
        UNUSED(p);
 
+       if (no_msr)
+               return 0;
+
        cpu = t->cpu_id;
 
        /* per-package */
@@ -4983,31 +5852,18 @@ void rapl_probe_intel(void)
        unsigned long long msr;
        unsigned int time_unit;
        double tdp;
+       const unsigned long long bic_watt_bits = BIC_PkgWatt | BIC_CorWatt | BIC_RAMWatt | BIC_GFXWatt;
+       const unsigned long long bic_joules_bits = BIC_Pkg_J | BIC_Cor_J | BIC_RAM_J | BIC_GFX_J;
 
-       if (rapl_joules) {
-               if (platform->rapl_msrs & RAPL_PKG_ENERGY_STATUS)
-                       BIC_PRESENT(BIC_Pkg_J);
-               if (platform->rapl_msrs & RAPL_CORE_ENERGY_STATUS)
-                       BIC_PRESENT(BIC_Cor_J);
-               if (platform->rapl_msrs & RAPL_DRAM_ENERGY_STATUS)
-                       BIC_PRESENT(BIC_RAM_J);
-               if (platform->rapl_msrs & RAPL_GFX_ENERGY_STATUS)
-                       BIC_PRESENT(BIC_GFX_J);
-       } else {
-               if (platform->rapl_msrs & RAPL_PKG_ENERGY_STATUS)
-                       BIC_PRESENT(BIC_PkgWatt);
-               if (platform->rapl_msrs & RAPL_CORE_ENERGY_STATUS)
-                       BIC_PRESENT(BIC_CorWatt);
-               if (platform->rapl_msrs & RAPL_DRAM_ENERGY_STATUS)
-                       BIC_PRESENT(BIC_RAMWatt);
-               if (platform->rapl_msrs & RAPL_GFX_ENERGY_STATUS)
-                       BIC_PRESENT(BIC_GFXWatt);
-       }
+       if (rapl_joules)
+               bic_enabled &= ~bic_watt_bits;
+       else
+               bic_enabled &= ~bic_joules_bits;
 
-       if (platform->rapl_msrs & RAPL_PKG_PERF_STATUS)
-               BIC_PRESENT(BIC_PKG__);
-       if (platform->rapl_msrs & RAPL_DRAM_PERF_STATUS)
-               BIC_PRESENT(BIC_RAM__);
+       if (!(platform->rapl_msrs & RAPL_PKG_PERF_STATUS))
+               bic_enabled &= ~BIC_PKG__;
+       if (!(platform->rapl_msrs & RAPL_DRAM_PERF_STATUS))
+               bic_enabled &= ~BIC_RAM__;
 
        /* units on package 0, verify later other packages match */
        if (get_msr(base_cpu, MSR_RAPL_POWER_UNIT, &msr))
@@ -5041,14 +5897,13 @@ void rapl_probe_amd(void)
 {
        unsigned long long msr;
        double tdp;
+       const unsigned long long bic_watt_bits = BIC_PkgWatt | BIC_CorWatt;
+       const unsigned long long bic_joules_bits = BIC_Pkg_J | BIC_Cor_J;
 
-       if (rapl_joules) {
-               BIC_PRESENT(BIC_Pkg_J);
-               BIC_PRESENT(BIC_Cor_J);
-       } else {
-               BIC_PRESENT(BIC_PkgWatt);
-               BIC_PRESENT(BIC_CorWatt);
-       }
+       if (rapl_joules)
+               bic_enabled &= ~bic_watt_bits;
+       else
+               bic_enabled &= ~bic_joules_bits;
 
        if (get_msr(base_cpu, MSR_RAPL_PWR_UNIT, &msr))
                return;
@@ -5202,7 +6057,7 @@ int print_rapl(struct thread_data *t, struct core_data *c, struct pkg_data *p)
  */
 void probe_rapl(void)
 {
-       if (!platform->rapl_msrs)
+       if (!platform->rapl_msrs || no_msr)
                return;
 
        if (genuine_intel)
@@ -5258,7 +6113,7 @@ int set_temperature_target(struct thread_data *t, struct core_data *c, struct pk
        }
 
        /* Temperature Target MSR is Nehalem and newer only */
-       if (!platform->has_nhm_msrs)
+       if (!platform->has_nhm_msrs || no_msr)
                goto guess;
 
        if (get_msr(base_cpu, MSR_IA32_TEMPERATURE_TARGET, &msr))
@@ -5305,6 +6160,9 @@ int print_thermal(struct thread_data *t, struct core_data *c, struct pkg_data *p
        UNUSED(c);
        UNUSED(p);
 
+       if (no_msr)
+               return 0;
+
        if (!(do_dts || do_ptm))
                return 0;
 
@@ -5402,6 +6260,9 @@ void decode_feature_control_msr(void)
 {
        unsigned long long msr;
 
+       if (no_msr)
+               return;
+
        if (!get_msr(base_cpu, MSR_IA32_FEAT_CTL, &msr))
                fprintf(outf, "cpu%d: MSR_IA32_FEATURE_CONTROL: 0x%08llx (%sLocked %s)\n",
                        base_cpu, msr, msr & FEAT_CTL_LOCKED ? "" : "UN-", msr & (1 << 18) ? "SGX" : "");
@@ -5411,6 +6272,9 @@ void decode_misc_enable_msr(void)
 {
        unsigned long long msr;
 
+       if (no_msr)
+               return;
+
        if (!genuine_intel)
                return;
 
@@ -5428,6 +6292,9 @@ void decode_misc_feature_control(void)
 {
        unsigned long long msr;
 
+       if (no_msr)
+               return;
+
        if (!platform->has_msr_misc_feature_control)
                return;
 
@@ -5449,6 +6316,9 @@ void decode_misc_pwr_mgmt_msr(void)
 {
        unsigned long long msr;
 
+       if (no_msr)
+               return;
+
        if (!platform->has_msr_misc_pwr_mgmt)
                return;
 
@@ -5468,6 +6338,9 @@ void decode_c6_demotion_policy_msr(void)
 {
        unsigned long long msr;
 
+       if (no_msr)
+               return;
+
        if (!platform->has_msr_c6_demotion_policy_config)
                return;
 
@@ -5489,7 +6362,8 @@ void print_dev_latency(void)
 
        fd = open(path, O_RDONLY);
        if (fd < 0) {
-               warnx("capget(CAP_SYS_ADMIN) failed, try \"# setcap cap_sys_admin=ep %s\"", progname);
+               if (debug)
+                       warnx("Read %s failed", path);
                return;
        }
 
@@ -5504,23 +6378,260 @@ void print_dev_latency(void)
        close(fd);
 }
 
+static int has_instr_count_access(void)
+{
+       int fd;
+       int has_access;
+
+       if (no_perf)
+               return 0;
+
+       fd = open_perf_counter(base_cpu, PERF_TYPE_HARDWARE, PERF_COUNT_HW_INSTRUCTIONS, -1, 0);
+       has_access = fd != -1;
+
+       if (fd != -1)
+               close(fd);
+
+       if (!has_access)
+               warnx("Failed to access %s. Some of the counters may not be available\n"
+                     "\tRun as root to enable them or use %s to disable the access explicitly",
+                     "instructions retired perf counter", "--no-perf");
+
+       return has_access;
+}
+
+bool is_aperf_access_required(void)
+{
+       return BIC_IS_ENABLED(BIC_Avg_MHz)
+           || BIC_IS_ENABLED(BIC_Busy)
+           || BIC_IS_ENABLED(BIC_Bzy_MHz)
+           || BIC_IS_ENABLED(BIC_IPC);
+}
+
+int add_rapl_perf_counter_(int cpu, struct rapl_counter_info_t *rci, const struct rapl_counter_arch_info *cai,
+                          double *scale_, enum rapl_unit *unit_)
+{
+       if (no_perf)
+               return -1;
+
+       const double scale = read_perf_rapl_scale(cai->perf_subsys, cai->perf_name);
+
+       if (scale == 0.0)
+               return -1;
+
+       const enum rapl_unit unit = read_perf_rapl_unit(cai->perf_subsys, cai->perf_name);
+
+       if (unit == RAPL_UNIT_INVALID)
+               return -1;
+
+       const unsigned int rapl_type = read_perf_type(cai->perf_subsys);
+       const unsigned int rapl_energy_pkg_config = read_rapl_config(cai->perf_subsys, cai->perf_name);
+
+       const int fd_counter =
+           open_perf_counter(cpu, rapl_type, rapl_energy_pkg_config, rci->fd_perf, PERF_FORMAT_GROUP);
+       if (fd_counter == -1)
+               return -1;
+
+       /* If it's the first counter opened, make it a group descriptor */
+       if (rci->fd_perf == -1)
+               rci->fd_perf = fd_counter;
+
+       *scale_ = scale;
+       *unit_ = unit;
+       return fd_counter;
+}
+
+int add_rapl_perf_counter(int cpu, struct rapl_counter_info_t *rci, const struct rapl_counter_arch_info *cai,
+                         double *scale, enum rapl_unit *unit)
+{
+       int ret = add_rapl_perf_counter_(cpu, rci, cai, scale, unit);
+
+       if (debug)
+               fprintf(stderr, "%s: %d (cpu: %d)\n", __func__, ret, cpu);
+
+       return ret;
+}
+
 /*
  * Linux-perf manages the HW instructions-retired counter
  * by enabling when requested, and hiding rollover
  */
 void linux_perf_init(void)
 {
-       if (!BIC_IS_ENABLED(BIC_IPC))
-               return;
-
        if (access("/proc/sys/kernel/perf_event_paranoid", F_OK))
                return;
 
-       fd_instr_count_percpu = calloc(topo.max_cpu_num + 1, sizeof(int));
-       if (fd_instr_count_percpu == NULL)
-               err(-1, "calloc fd_instr_count_percpu");
+       if (BIC_IS_ENABLED(BIC_IPC) && has_aperf) {
+               fd_instr_count_percpu = calloc(topo.max_cpu_num + 1, sizeof(int));
+               if (fd_instr_count_percpu == NULL)
+                       err(-1, "calloc fd_instr_count_percpu");
+       }
+
+       const bool aperf_required = is_aperf_access_required();
+
+       if (aperf_required && has_aperf && amperf_source == AMPERF_SOURCE_PERF) {
+               fd_amperf_percpu = calloc(topo.max_cpu_num + 1, sizeof(*fd_amperf_percpu));
+               if (fd_amperf_percpu == NULL)
+                       err(-1, "calloc fd_amperf_percpu");
+       }
+}
+
+void rapl_perf_init(void)
+{
+       const int num_domains = platform->has_per_core_rapl ? topo.num_cores : topo.num_packages;
+       bool *domain_visited = calloc(num_domains, sizeof(bool));
+
+       rapl_counter_info_perdomain = calloc(num_domains, sizeof(*rapl_counter_info_perdomain));
+       if (rapl_counter_info_perdomain == NULL)
+               err(-1, "calloc rapl_counter_info_percpu");
+
+       /*
+        * Initialize rapl_counter_info_percpu
+        */
+       for (int domain_id = 0; domain_id < num_domains; ++domain_id) {
+               struct rapl_counter_info_t *rci = &rapl_counter_info_perdomain[domain_id];
+
+               rci->fd_perf = -1;
+               for (size_t i = 0; i < NUM_RAPL_COUNTERS; ++i) {
+                       rci->data[i] = 0;
+                       rci->source[i] = RAPL_SOURCE_NONE;
+               }
+       }
+
+       /*
+        * Open/probe the counters
+        * If can't get it via perf, fallback to MSR
+        */
+       for (size_t i = 0; i < ARRAY_SIZE(rapl_counter_arch_infos); ++i) {
+
+               const struct rapl_counter_arch_info *const cai = &rapl_counter_arch_infos[i];
+               bool has_counter = 0;
+               double scale;
+               enum rapl_unit unit;
+               int next_domain;
+
+               memset(domain_visited, 0, num_domains * sizeof(*domain_visited));
+
+               for (int cpu = 0; cpu < topo.max_cpu_num + 1; ++cpu) {
+
+                       if (cpu_is_not_allowed(cpu))
+                               continue;
+
+                       /* Skip already seen and handled RAPL domains */
+                       next_domain =
+                           platform->has_per_core_rapl ? cpus[cpu].physical_core_id : cpus[cpu].physical_package_id;
+
+                       if (domain_visited[next_domain])
+                               continue;
+
+                       domain_visited[next_domain] = 1;
+
+                       struct rapl_counter_info_t *rci = &rapl_counter_info_perdomain[next_domain];
+
+                       /* Check if the counter is enabled and accessible */
+                       if (BIC_IS_ENABLED(cai->bic) && (platform->rapl_msrs & cai->feature_mask)) {
+
+                               /* Use perf API for this counter */
+                               if (!no_perf && cai->perf_name
+                                   && add_rapl_perf_counter(cpu, rci, cai, &scale, &unit) != -1) {
+                                       rci->source[cai->rci_index] = RAPL_SOURCE_PERF;
+                                       rci->scale[cai->rci_index] = scale * cai->compat_scale;
+                                       rci->unit[cai->rci_index] = unit;
+                                       rci->flags[cai->rci_index] = cai->flags;
+
+                                       /* Use MSR for this counter */
+                               } else if (!no_msr && cai->msr && probe_msr(cpu, cai->msr) == 0) {
+                                       rci->source[cai->rci_index] = RAPL_SOURCE_MSR;
+                                       rci->msr[cai->rci_index] = cai->msr;
+                                       rci->msr_mask[cai->rci_index] = cai->msr_mask;
+                                       rci->msr_shift[cai->rci_index] = cai->msr_shift;
+                                       rci->unit[cai->rci_index] = RAPL_UNIT_JOULES;
+                                       rci->scale[cai->rci_index] = *cai->platform_rapl_msr_scale * cai->compat_scale;
+                                       rci->flags[cai->rci_index] = cai->flags;
+                               }
+                       }
+
+                       if (rci->source[cai->rci_index] != RAPL_SOURCE_NONE)
+                               has_counter = 1;
+               }
+
+               /* If any CPU has access to the counter, make it present */
+               if (has_counter)
+                       BIC_PRESENT(cai->bic);
+       }
+
+       free(domain_visited);
+}
+
+static int has_amperf_access_via_msr(void)
+{
+       if (no_msr)
+               return 0;
+
+       if (probe_msr(base_cpu, MSR_IA32_APERF))
+               return 0;
+
+       if (probe_msr(base_cpu, MSR_IA32_MPERF))
+               return 0;
+
+       return 1;
+}
+
+static int has_amperf_access_via_perf(void)
+{
+       struct amperf_group_fd fds;
+
+       /*
+        * Cache the last result, so we don't warn the user multiple times
+        *
+        * Negative means cached, no access
+        * Zero means not cached
+        * Positive means cached, has access
+        */
+       static int has_access_cached;
+
+       if (no_perf)
+               return 0;
+
+       if (has_access_cached != 0)
+               return has_access_cached > 0;
+
+       fds = open_amperf_fd(base_cpu);
+       has_access_cached = (fds.aperf != -1) && (fds.mperf != -1);
+
+       if (fds.aperf == -1)
+               warnx("Failed to access %s. Some of the counters may not be available\n"
+                     "\tRun as root to enable them or use %s to disable the access explicitly",
+                     "APERF perf counter", "--no-perf");
+       else
+               close(fds.aperf);
+
+       if (fds.mperf == -1)
+               warnx("Failed to access %s. Some of the counters may not be available\n"
+                     "\tRun as root to enable them or use %s to disable the access explicitly",
+                     "MPERF perf counter", "--no-perf");
+       else
+               close(fds.mperf);
+
+       if (has_access_cached == 0)
+               has_access_cached = -1;
+
+       return has_access_cached > 0;
+}
+
+/* Check if we can access APERF and MPERF */
+static int has_amperf_access(void)
+{
+       if (!is_aperf_access_required())
+               return 0;
+
+       if (!no_msr && has_amperf_access_via_msr())
+               return 1;
+
+       if (!no_perf && has_amperf_access_via_perf())
+               return 1;
 
-       BIC_PRESENT(BIC_IPC);
+       return 0;
 }
 
 void probe_cstates(void)
@@ -5563,7 +6674,7 @@ void probe_cstates(void)
        if (platform->has_msr_module_c6_res_ms)
                BIC_PRESENT(BIC_Mod_c6);
 
-       if (platform->has_ext_cst_msrs) {
+       if (platform->has_ext_cst_msrs && !no_msr) {
                BIC_PRESENT(BIC_Totl_c0);
                BIC_PRESENT(BIC_Any_c0);
                BIC_PRESENT(BIC_GFX_c0);
@@ -5623,6 +6734,7 @@ void process_cpuid()
        unsigned int eax, ebx, ecx, edx;
        unsigned int fms, family, model, stepping, ecx_flags, edx_flags;
        unsigned long long ucode_patch = 0;
+       bool ucode_patch_valid = false;
 
        eax = ebx = ecx = edx = 0;
 
@@ -5650,8 +6762,12 @@ void process_cpuid()
        ecx_flags = ecx;
        edx_flags = edx;
 
-       if (get_msr(sched_getcpu(), MSR_IA32_UCODE_REV, &ucode_patch))
-               warnx("get_msr(UCODE)");
+       if (!no_msr) {
+               if (get_msr(sched_getcpu(), MSR_IA32_UCODE_REV, &ucode_patch))
+                       warnx("get_msr(UCODE)");
+               else
+                       ucode_patch_valid = true;
+       }
 
        /*
         * check max extended function levels of CPUID.
@@ -5662,9 +6778,12 @@ void process_cpuid()
        __cpuid(0x80000000, max_extended_level, ebx, ecx, edx);
 
        if (!quiet) {
-               fprintf(outf, "CPUID(1): family:model:stepping 0x%x:%x:%x (%d:%d:%d) microcode 0x%x\n",
-                       family, model, stepping, family, model, stepping,
-                       (unsigned int)((ucode_patch >> 32) & 0xFFFFFFFF));
+               fprintf(outf, "CPUID(1): family:model:stepping 0x%x:%x:%x (%d:%d:%d)",
+                       family, model, stepping, family, model, stepping);
+               if (ucode_patch_valid)
+                       fprintf(outf, " microcode 0x%x", (unsigned int)((ucode_patch >> 32) & 0xFFFFFFFF));
+               fputc('\n', outf);
+
                fprintf(outf, "CPUID(0x80000000): max_extended_levels: 0x%x\n", max_extended_level);
                fprintf(outf, "CPUID(1): %s %s %s %s %s %s %s %s %s %s\n",
                        ecx_flags & (1 << 0) ? "SSE3" : "-",
@@ -5700,10 +6819,11 @@ void process_cpuid()
 
        __cpuid(0x6, eax, ebx, ecx, edx);
        has_aperf = ecx & (1 << 0);
-       if (has_aperf) {
+       if (has_aperf && has_amperf_access()) {
                BIC_PRESENT(BIC_Avg_MHz);
                BIC_PRESENT(BIC_Busy);
                BIC_PRESENT(BIC_Bzy_MHz);
+               BIC_PRESENT(BIC_IPC);
        }
        do_dts = eax & (1 << 0);
        if (do_dts)
@@ -5786,6 +6906,15 @@ void process_cpuid()
                base_mhz = max_mhz = bus_mhz = edx = 0;
 
                __cpuid(0x16, base_mhz, max_mhz, bus_mhz, edx);
+
+               bclk = bus_mhz;
+
+               base_hz = base_mhz * 1000000;
+               has_base_hz = 1;
+
+               if (platform->enable_tsc_tweak)
+                       tsc_tweak = base_hz / tsc_hz;
+
                if (!quiet)
                        fprintf(outf, "CPUID(0x16): base_mhz: %d max_mhz: %d bus_mhz: %d\n",
                                base_mhz, max_mhz, bus_mhz);
@@ -5814,7 +6943,7 @@ void probe_pm_features(void)
 
        probe_thermal();
 
-       if (platform->has_nhm_msrs)
+       if (platform->has_nhm_msrs && !no_msr)
                BIC_PRESENT(BIC_SMI);
 
        if (!quiet)
@@ -6142,6 +7271,7 @@ void topology_update(void)
        topo.allowed_packages = 0;
        for_all_cpus(update_topo, ODD_COUNTERS);
 }
+
 void setup_all_buffers(bool startup)
 {
        topology_probe(startup);
@@ -6169,21 +7299,129 @@ void set_base_cpu(void)
        err(-ENODEV, "No valid cpus found");
 }
 
+static void set_amperf_source(void)
+{
+       amperf_source = AMPERF_SOURCE_PERF;
+
+       const bool aperf_required = is_aperf_access_required();
+
+       if (no_perf || !aperf_required || !has_amperf_access_via_perf())
+               amperf_source = AMPERF_SOURCE_MSR;
+
+       if (quiet || !debug)
+               return;
+
+       fprintf(outf, "aperf/mperf source preference: %s\n", amperf_source == AMPERF_SOURCE_MSR ? "msr" : "perf");
+}
+
+bool has_added_counters(void)
+{
+       /*
+        * It only makes sense to call this after the command line is parsed,
+        * otherwise sys structure is not populated.
+        */
+
+       return sys.added_core_counters | sys.added_thread_counters | sys.added_package_counters;
+}
+
+bool is_msr_access_required(void)
+{
+       if (no_msr)
+               return false;
+
+       if (has_added_counters())
+               return true;
+
+       return BIC_IS_ENABLED(BIC_SMI)
+           || BIC_IS_ENABLED(BIC_CPU_c1)
+           || BIC_IS_ENABLED(BIC_CPU_c3)
+           || BIC_IS_ENABLED(BIC_CPU_c6)
+           || BIC_IS_ENABLED(BIC_CPU_c7)
+           || BIC_IS_ENABLED(BIC_Mod_c6)
+           || BIC_IS_ENABLED(BIC_CoreTmp)
+           || BIC_IS_ENABLED(BIC_Totl_c0)
+           || BIC_IS_ENABLED(BIC_Any_c0)
+           || BIC_IS_ENABLED(BIC_GFX_c0)
+           || BIC_IS_ENABLED(BIC_CPUGFX)
+           || BIC_IS_ENABLED(BIC_Pkgpc3)
+           || BIC_IS_ENABLED(BIC_Pkgpc6)
+           || BIC_IS_ENABLED(BIC_Pkgpc2)
+           || BIC_IS_ENABLED(BIC_Pkgpc7)
+           || BIC_IS_ENABLED(BIC_Pkgpc8)
+           || BIC_IS_ENABLED(BIC_Pkgpc9)
+           || BIC_IS_ENABLED(BIC_Pkgpc10)
+           /* TODO: Multiplex access with perf */
+           || BIC_IS_ENABLED(BIC_CorWatt)
+           || BIC_IS_ENABLED(BIC_Cor_J)
+           || BIC_IS_ENABLED(BIC_PkgWatt)
+           || BIC_IS_ENABLED(BIC_CorWatt)
+           || BIC_IS_ENABLED(BIC_GFXWatt)
+           || BIC_IS_ENABLED(BIC_RAMWatt)
+           || BIC_IS_ENABLED(BIC_Pkg_J)
+           || BIC_IS_ENABLED(BIC_Cor_J)
+           || BIC_IS_ENABLED(BIC_GFX_J)
+           || BIC_IS_ENABLED(BIC_RAM_J)
+           || BIC_IS_ENABLED(BIC_PKG__)
+           || BIC_IS_ENABLED(BIC_RAM__)
+           || BIC_IS_ENABLED(BIC_PkgTmp)
+           || (is_aperf_access_required() && !has_amperf_access_via_perf());
+}
+
+void check_msr_access(void)
+{
+       if (!is_msr_access_required())
+               no_msr = 1;
+
+       check_dev_msr();
+       check_msr_permission();
+
+       if (no_msr)
+               bic_disable_msr_access();
+}
+
+void check_perf_access(void)
+{
+       const bool intrcount_required = BIC_IS_ENABLED(BIC_IPC);
+
+       if (no_perf || !intrcount_required || !has_instr_count_access())
+               bic_enabled &= ~BIC_IPC;
+
+       const bool aperf_required = is_aperf_access_required();
+
+       if (!aperf_required || !has_amperf_access()) {
+               bic_enabled &= ~BIC_Avg_MHz;
+               bic_enabled &= ~BIC_Busy;
+               bic_enabled &= ~BIC_Bzy_MHz;
+               bic_enabled &= ~BIC_IPC;
+       }
+}
+
 void turbostat_init()
 {
        setup_all_buffers(true);
        set_base_cpu();
-       check_dev_msr();
-       check_permissions();
+       check_msr_access();
+       check_perf_access();
        process_cpuid();
        probe_pm_features();
+       set_amperf_source();
        linux_perf_init();
+       rapl_perf_init();
 
        for_all_cpus(get_cpu_type, ODD_COUNTERS);
        for_all_cpus(get_cpu_type, EVEN_COUNTERS);
 
        if (DO_BIC(BIC_IPC))
                (void)get_instr_count_fd(base_cpu);
+
+       /*
+        * If TSC tweak is needed, but couldn't get it,
+        * disable more BICs, since it can't be reported accurately.
+        */
+       if (platform->enable_tsc_tweak && !has_base_hz) {
+               bic_enabled &= ~BIC_Busy;
+               bic_enabled &= ~BIC_Bzy_MHz;
+       }
 }
 
 int fork_it(char **argv)
@@ -6259,7 +7497,7 @@ int get_and_dump_counters(void)
 
 void print_version()
 {
-       fprintf(outf, "turbostat version 2023.11.07 - Len Brown <lenb@kernel.org>\n");
+       fprintf(outf, "turbostat version 2024.04.08 - Len Brown <lenb@kernel.org>\n");
 }
 
 #define COMMAND_LINE_SIZE 2048
@@ -6291,6 +7529,9 @@ int add_counter(unsigned int msr_num, char *path, char *name,
 {
        struct msr_counter *msrp;
 
+       if (no_msr && msr_num)
+               errx(1, "Requested MSR counter 0x%x, but in --no-msr mode", msr_num);
+
        msrp = calloc(1, sizeof(struct msr_counter));
        if (msrp == NULL) {
                perror("calloc");
@@ -6595,6 +7836,8 @@ void cmdline(int argc, char **argv)
                { "list", no_argument, 0, 'l' },
                { "out", required_argument, 0, 'o' },
                { "quiet", no_argument, 0, 'q' },
+               { "no-msr", no_argument, 0, 'M' },
+               { "no-perf", no_argument, 0, 'P' },
                { "show", required_argument, 0, 's' },
                { "Summary", no_argument, 0, 'S' },
                { "TCC", required_argument, 0, 'T' },
@@ -6604,7 +7847,25 @@ void cmdline(int argc, char **argv)
 
        progname = argv[0];
 
-       while ((opt = getopt_long_only(argc, argv, "+C:c:Dde:hi:Jn:o:qST:v", long_options, &option_index)) != -1) {
+       /*
+        * Parse some options early, because they may make other options invalid,
+        * like adding the MSR counter with --add and at the same time using --no-msr.
+        */
+       while ((opt = getopt_long_only(argc, argv, "MP", long_options, &option_index)) != -1) {
+               switch (opt) {
+               case 'M':
+                       no_msr = 1;
+                       break;
+               case 'P':
+                       no_perf = 1;
+                       break;
+               default:
+                       break;
+               }
+       }
+       optind = 0;
+
+       while ((opt = getopt_long_only(argc, argv, "+C:c:Dde:hi:Jn:o:qMST:v", long_options, &option_index)) != -1) {
                switch (opt) {
                case 'a':
                        parse_add_command(optarg);
@@ -6662,6 +7923,10 @@ void cmdline(int argc, char **argv)
                case 'q':
                        quiet = 1;
                        break;
+               case 'M':
+               case 'P':
+                       /* Parsed earlier */
+                       break;
                case 'n':
                        num_iterations = strtod(optarg, NULL);
 
@@ -6704,6 +7969,22 @@ void cmdline(int argc, char **argv)
        }
 }
 
+void set_rlimit(void)
+{
+       struct rlimit limit;
+
+       if (getrlimit(RLIMIT_NOFILE, &limit) < 0)
+               err(1, "Failed to get rlimit");
+
+       if (limit.rlim_max < MAX_NOFILE)
+               limit.rlim_max = MAX_NOFILE;
+       if (limit.rlim_cur < MAX_NOFILE)
+               limit.rlim_cur = MAX_NOFILE;
+
+       if (setrlimit(RLIMIT_NOFILE, &limit) < 0)
+               err(1, "Failed to set rlimit");
+}
+
 int main(int argc, char **argv)
 {
        int fd, ret;
@@ -6729,9 +8010,13 @@ skip_cgroup_setting:
 
        probe_sysfs();
 
+       if (!getuid())
+               set_rlimit();
+
        turbostat_init();
 
-       msr_sum_record();
+       if (!no_msr)
+               msr_sum_record();
 
        /* dump counters and exit */
        if (dump_only)
index c02990bbd56f4cf1cb5ea878f8fa76c4b6057c8d..9007c420d52c5201c40284f4f91cd7687f9d7188 100644 (file)
@@ -3,7 +3,7 @@
 #include <stdbool.h>
 #include <sys/mman.h>
 #include <err.h>
-#include <string.h> /* ffsl() */
+#include <strings.h> /* ffsl() */
 #include <unistd.h> /* _SC_PAGESIZE */
 
 #define BIT_ULL(nr)                   (1ULL << (nr))
diff --git a/tools/testing/selftests/turbostat/defcolumns.py b/tools/testing/selftests/turbostat/defcolumns.py
new file mode 100755 (executable)
index 0000000..d9b0420
--- /dev/null
@@ -0,0 +1,60 @@
+#!/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+
+import subprocess
+from shutil import which
+
+turbostat = which('turbostat')
+if turbostat is None:
+       print('Could not find turbostat binary')
+       exit(1)
+
+timeout = which('timeout')
+if timeout is None:
+       print('Could not find timeout binary')
+       exit(1)
+
+proc_turbostat = subprocess.run([turbostat, '--list'], capture_output = True)
+if proc_turbostat.returncode != 0:
+       print(f'turbostat failed with {proc_turbostat.returncode}')
+       exit(1)
+
+#
+# By default --list reports also "usec" and "Time_Of_Day_Seconds" columns
+# which are only visible when running with --debug.
+#
+expected_columns_debug = proc_turbostat.stdout.replace(b',', b'\t').strip()
+expected_columns = expected_columns_debug.replace(b'usec\t', b'').replace(b'Time_Of_Day_Seconds\t', b'').replace(b'X2APIC\t', b'').replace(b'APIC\t', b'')
+
+#
+# Run turbostat with no options for 10 seconds and send SIGINT
+#
+timeout_argv = [timeout, '--preserve-status', '-s', 'SIGINT', '-k', '3', '1s']
+turbostat_argv = [turbostat, '-i', '0.250']
+
+print(f'Running turbostat with {turbostat_argv=}... ', end = '', flush = True)
+proc_turbostat = subprocess.run(timeout_argv + turbostat_argv, capture_output = True)
+if proc_turbostat.returncode != 0:
+       print(f'turbostat failed with {proc_turbostat.returncode}')
+       exit(1)
+actual_columns = proc_turbostat.stdout.split(b'\n')[0]
+if expected_columns != actual_columns:
+       print(f'turbostat column check failed\n{expected_columns=}\n{actual_columns=}')
+       exit(1)
+print('OK')
+
+#
+# Same, but with --debug
+#
+turbostat_argv.append('--debug')
+
+print(f'Running turbostat with {turbostat_argv=}... ', end = '', flush = True)
+proc_turbostat = subprocess.run(timeout_argv + turbostat_argv, capture_output = True)
+if proc_turbostat.returncode != 0:
+       print(f'turbostat failed with {proc_turbostat.returncode}')
+       exit(1)
+actual_columns = proc_turbostat.stdout.split(b'\n')[0]
+if expected_columns_debug != actual_columns:
+       print(f'turbostat column check failed\n{expected_columns_debug=}\n{actual_columns=}')
+       exit(1)
+print('OK')