linux-2.6-block.git
10 months agoperf vendor events intel: Update westmereex events to v4
Ian Rogers [Thu, 26 Oct 2023 00:31:47 +0000 (17:31 -0700)]
perf vendor events intel: Update westmereex events to v4

Update westmereex events from v3 to v4 fixing a spelling issue.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20231026003149.3287633-7-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
10 months agoperf vendor events intel: Update meteorlake events to v1.06
Ian Rogers [Thu, 26 Oct 2023 00:31:46 +0000 (17:31 -0700)]
perf vendor events intel: Update meteorlake events to v1.06

Update meteorlake from v1.04 to v1.06 adding the changes from:
https://github.com/intel/perfmon/commit/bc84df043091ec7c98c0629f3d074d9d7a108194
https://github.com/intel/perfmon/commit/405d3ee987d756b5b5d9a64d8a8fa77559822ecf

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20231026003149.3287633-6-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
10 months agoperf vendor events intel: Update knightslanding events to v16
Ian Rogers [Thu, 26 Oct 2023 00:31:45 +0000 (17:31 -0700)]
perf vendor events intel: Update knightslanding events to v16

Update knightslanding from v10 to v16 adding the changes from:
https://github.com/intel/perfmon/commit/6c1f169f6ed63ee1fd75ebb303d0fd06d71196f5
https://github.com/intel/perfmon/commit/b22ca587ec8b5ac20471ea2f14924f63e63afe9d
https://github.com/intel/perfmon/commit/e685286f083ee81cb7dafd0cd8546c79ee433187

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20231026003149.3287633-5-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
10 months agoperf vendor events intel: Add typo fix for ivybridge FP
Ian Rogers [Thu, 26 Oct 2023 00:31:44 +0000 (17:31 -0700)]
perf vendor events intel: Add typo fix for ivybridge FP

Add a missed space.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20231026003149.3287633-4-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
10 months agoperf vendor events intel: Update a spelling in haswell/haswellx
Ian Rogers [Thu, 26 Oct 2023 00:31:43 +0000 (17:31 -0700)]
perf vendor events intel: Update a spelling in haswell/haswellx

The spelling of "in-flight" was switched to "inflight".

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20231026003149.3287633-3-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
10 months agoperf vendor events intel: Update emeraldrapids to v1.01
Ian Rogers [Thu, 26 Oct 2023 00:31:42 +0000 (17:31 -0700)]
perf vendor events intel: Update emeraldrapids to v1.01

Update emeraldrapids to v1.01 from v1.00 adding the changes from:
https://github.com/intel/perfmon/commit/3993b600e032a9fd443ffd828aab73de7cb167e5

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20231026003149.3287633-2-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
10 months agoperf vendor events intel: Update alderlake/alderlake events to v1.23
Ian Rogers [Thu, 26 Oct 2023 00:31:41 +0000 (17:31 -0700)]
perf vendor events intel: Update alderlake/alderlake events to v1.23

Update alderlake and alderlaken events from v1.21 to v1.23 adding the
changes from:
https://github.com/intel/perfmon/commit/8df4db9433a2aab59dbbac1a70281032d1af7734
https://github.com/intel/perfmon/commit/846bd247c6e04acc572ca56c992e9e65852bbe63

The tsx_cycles_per_elision metric is updated from PR:
https://github.com/intel/perfmon/pull/116

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20231026003149.3287633-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
10 months agoperf build: Disable BPF skeletons if clang version is < 12.0.1
Arnaldo Carvalho de Melo [Fri, 27 Oct 2023 14:18:47 +0000 (11:18 -0300)]
perf build: Disable BPF skeletons if clang version is < 12.0.1

While building on a wide range of distros and clang versions it was
noticed that at least version 12.0.1 (noticed on Alpine 3.15 with
"Alpine clang version 12.0.1") is needed to not fail with BTF generation
errors such as:

Debian:10

  Debian clang version 11.0.1-2~deb10u1:

    CLANG   /tmp/build/perf/util/bpf_skel/.tmp/sample_filter.bpf.o
  <SNIP>
    GENSKEL /tmp/build/perf/util/bpf_skel/sample_filter.skel.h
  libbpf: failed to find BTF for extern 'bpf_cast_to_kern_ctx' [21] section: -2
  Error: failed to open BPF object file: No such file or directory
  make[2]: *** [Makefile.perf:1121: /tmp/build/perf/util/bpf_skel/sample_filter.skel.h] Error 254
  make[2]: *** Deleting file '/tmp/build/perf/util/bpf_skel/sample_filter.skel.h'

Amazon Linux 2:

  clang version 11.1.0 (Amazon Linux 2 11.1.0-1.amzn2.0.2)

    GENSKEL /tmp/build/perf/util/bpf_skel/sample_filter.skel.h
  libbpf: elf: skipping unrecognized data section(18) .eh_frame
  libbpf: elf: skipping relo section(19) .rel.eh_frame for section(18) .eh_frame
  libbpf: failed to find BTF for extern 'bpf_cast_to_kern_ctx' [21] section: -2
  Error: failed to open BPF object file: No such file or directory
  make[2]: *** [/tmp/build/perf/util/bpf_skel/sample_filter.skel.h] Error 254
  make[2]: *** Deleting file `/tmp/build/perf/util/bpf_skel/sample_filter.skel.h'

Ubuntu 20.04:

  clang version 10.0.0-4ubuntu1

    CLANG   /tmp/build/perf/util/bpf_skel/.tmp/augmented_raw_syscalls.bpf.o
    GENSKEL /tmp/build/perf/util/bpf_skel/bench_uprobe.skel.h
    GENSKEL /tmp/build/perf/util/bpf_skel/bperf_leader.skel.h
  libbpf: sec '.reluprobe': corrupted symbol #27 pointing to invalid section #65522 for relo #0
    GENSKEL /tmp/build/perf/util/bpf_skel/bperf_follower.skel.h
  Error: failed to open BPF object file: BPF object format invalid
  make[2]: *** [Makefile.perf:1121: /tmp/build/perf/util/bpf_skel/bench_uprobe.skel.h] Error 95
  make[2]: *** Deleting file '/tmp/build/perf/util/bpf_skel/bench_uprobe.skel.h'

So check if the version is at least 12.0.1 otherwise disable building
BPF skels and provide a message about it, continuing the build.

The message, when running on amazonlinux:2:

  Makefile.config:698: Warning: Disabled BPF skeletons as reliable BTF generation needs at least clang version 12.0.1

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/ZTvGx/Ou6BVnYBqi@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
10 months agoperf callchain: Fix spelling mistake "statisitcs" -> "statistics"
Colin Ian King [Fri, 27 Oct 2023 08:46:33 +0000 (09:46 +0100)]
perf callchain: Fix spelling mistake "statisitcs" -> "statistics"

There are a couple of spelling mistakes in perror messages. Fix them.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Cc: kernel-janitors@vger.kernel.org
Link: https://lore.kernel.org/r/20231027084633.1167530-1-colin.i.king@gmail.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
10 months agoperf report: Fix spelling mistake "heirachy" -> "hierarchy"
Colin Ian King [Fri, 27 Oct 2023 08:40:11 +0000 (09:40 +0100)]
perf report: Fix spelling mistake "heirachy" -> "hierarchy"

There is a spelling mistake in a ui error message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Cc: kernel-janitors@vger.kernel.org
Link: https://lore.kernel.org/r/20231027084011.1167091-1-colin.i.king@gmail.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
10 months agoperf python: Fix binding linkage due to rename and move of evsel__increase_rlimit()
Arnaldo Carvalho de Melo [Fri, 27 Oct 2023 13:33:30 +0000 (10:33 -0300)]
perf python: Fix binding linkage due to rename and move of evsel__increase_rlimit()

The changes in ("perf evsel: Rename evsel__increase_rlimit to
rlimit__increase_nofile") ended up breaking the python binding that now
references the rlimit__increase_nofile function, add the util/rlimit.o
to the tools/perf/util/python-ext-sources to cure that.

This was detected by the 'perf test python' regression test:

  $ perf test python
   14: 'import perf' in python        : FAILED!

  $ perf test -v python
  Couldn't bump rlimit(MEMLOCK), failures may take place when creating BPF maps, etc
   14: 'import perf' in python                                         :
  --- start ---
  test child forked, pid 2912462
  python usage test: "echo "import sys ; sys.path.insert(0, '/tmp/build/perf-tools-next/python'); import perf" | '/usr/bin/python3' "
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
  ImportError: /tmp/build/perf-tools-next/python/perf.cpython-311-x86_64-linux-gnu.so: undefined symbol: rlimit__increase_nofile
  test child finished with -1
  ---- end ----
  'import perf' in python: FAILED!
  $

Fixes: e093a222d7cba1eb ("perf evsel: Rename evsel__increase_rlimit to rlimit__increase_nofile")
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/lkml/ZTrCS5Z3PZAmfPdV@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
10 months agoperf tests: test_arm_coresight: Simplify source iteration
James Clark [Mon, 23 Oct 2023 13:15:49 +0000 (14:15 +0100)]
perf tests: test_arm_coresight: Simplify source iteration

There are two reasons to do this, firstly there is a shellcheck warning
in cs_etm_dev_name(), which can be completely deleted. And secondly the
current iteration method doesn't support systems with both ETE and ETM
because it picks one or the other. There isn't a known system with this
configuration, but it could happen in the future.

Iterating over all the sources for each CPU can be done by going through
/sys/bus/event_source/devices/cs_etm/cpu* and following the symlink back
to the Coresight device in /sys/bus/coresight/devices. This will work
whether the device is ETE, ETM or any future name, and is much simpler
and doesn't require any hard coded version numbers

Suggested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: James Clark <james.clark@arm.com>
Acked-by: Ian Rogers <irogers@google.com>
Tested-by: Leo Yan <leo.yan@linaro.org>
Cc: tianruidong@linux.alibaba.com
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Anushree Mathur <anushree.mathur@linux.vnet.ibm.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: atrajeev@linux.vnet.ibm.com
Cc: coresight@lists.linaro.org
Link: https://lore.kernel.org/r/20231023131550.487760-1-james.clark@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
10 months agoperf vendor events intel: Add tigerlake two metrics
Ian Rogers [Tue, 26 Sep 2023 20:59:48 +0000 (13:59 -0700)]
perf vendor events intel: Add tigerlake two metrics

Add tma_info_system_socket_clks and uncore_freq metrics.

The associated converter script fix is in:
https://github.com/intel/perfmon/pull/112

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Link: https://lore.kernel.org/r/20230926205948.1399594-3-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
10 months agoperf vendor events intel: Add broadwellde two metrics
Ian Rogers [Tue, 26 Sep 2023 20:59:47 +0000 (13:59 -0700)]
perf vendor events intel: Add broadwellde two metrics

Add tma_info_system_socket_clks and uncore_freq metrics that require a
broadwellx style uncore event for UNC_CLOCK.

The associated converter script fix is in:
https://github.com/intel/perfmon/pull/112

Fixes: 7d124303d620 ("perf vendor events intel: Update broadwell variant events/metrics")
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Link: https://lore.kernel.org/r/20230926205948.1399594-2-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
10 months agoperf vendor events intel: Fix broadwellde tma_info_system_dram_bw_use metric
Ian Rogers [Tue, 26 Sep 2023 03:10:34 +0000 (20:10 -0700)]
perf vendor events intel: Fix broadwellde tma_info_system_dram_bw_use metric

Broadwell-de has a consumer core and server uncore. The uncore_arb PMU
isn't present and the broadwellx style cbox PMU should be used
instead. Fix the tma_info_system_dram_bw_use metric to use the server
metric rather than client.

The associated converter script fix is in:
https://github.com/intel/perfmon/pull/111

Fixes: 7d124303d620 ("perf vendor events intel: Update broadwell variant events/metrics")
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Link: https://lore.kernel.org/r/20230926031034.1201145-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
10 months agoperf mem_info: Add and use map_symbol__exit and addr_map_symbol__exit
Ian Rogers [Tue, 24 Oct 2023 22:23:14 +0000 (15:23 -0700)]
perf mem_info: Add and use map_symbol__exit and addr_map_symbol__exit

Fix leak where mem_info__put wouldn't release the maps/map as used by
perf mem. Add exit functions and use elsewhere that the maps and map
are released.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: liuwenyu <liuwenyu7@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20231024222353.3024098-12-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
10 months agoperf callchain: Minor layout changes to callchain_list
Ian Rogers [Tue, 24 Oct 2023 22:23:13 +0000 (15:23 -0700)]
perf callchain: Minor layout changes to callchain_list

Avoid 6 byte hole for padding. Place more frequently used fields
first in an attempt to use just 1 cacheline in the common case.

Before:
```
struct callchain_list {
        u64                        ip;                   /*     0     8 */
        struct map_symbol          ms;                   /*     8    24 */
        struct {
                _Bool              unfolded;             /*    32     1 */
                _Bool              has_children;         /*    33     1 */
        };                                               /*    32     2 */

        /* XXX 6 bytes hole, try to pack */

        u64                        branch_count;         /*    40     8 */
        u64                        from_count;           /*    48     8 */
        u64                        predicted_count;      /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        u64                        abort_count;          /*    64     8 */
        u64                        cycles_count;         /*    72     8 */
        u64                        iter_count;           /*    80     8 */
        u64                        iter_cycles;          /*    88     8 */
        struct branch_type_stat *  brtype_stat;          /*    96     8 */
        const char  *              srcline;              /*   104     8 */
        struct list_head           list;                 /*   112    16 */

        /* size: 128, cachelines: 2, members: 13 */
        /* sum members: 122, holes: 1, sum holes: 6 */
};
```

After:
```
struct callchain_list {
        struct list_head           list;                 /*     0    16 */
        u64                        ip;                   /*    16     8 */
        struct map_symbol          ms;                   /*    24    24 */
        const char  *              srcline;              /*    48     8 */
        u64                        branch_count;         /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        u64                        from_count;           /*    64     8 */
        u64                        cycles_count;         /*    72     8 */
        u64                        iter_count;           /*    80     8 */
        u64                        iter_cycles;          /*    88     8 */
        struct branch_type_stat *  brtype_stat;          /*    96     8 */
        u64                        predicted_count;      /*   104     8 */
        u64                        abort_count;          /*   112     8 */
        struct {
                _Bool              unfolded;             /*   120     1 */
                _Bool              has_children;         /*   121     1 */
        };                                               /*   120     2 */

        /* size: 128, cachelines: 2, members: 13 */
        /* padding: 6 */
};
```

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: liuwenyu <liuwenyu7@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20231024222353.3024098-11-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
10 months agoperf callchain: Make brtype_stat in callchain_list optional
Ian Rogers [Tue, 24 Oct 2023 22:23:12 +0000 (15:23 -0700)]
perf callchain: Make brtype_stat in callchain_list optional

struct callchain_list is 352bytes in size, 232 of which are
brtype_stat. brtype_stat is only used for certain callchain_list
items so make it optional, allocating when necessary. So that
printing doesn't need to deal with an optional brtype_stat, pass
an empty/zero version.

Before:
```
struct callchain_list {
        u64                        ip;                   /*     0     8 */
        struct map_symbol          ms;                   /*     8    24 */
        struct {
                _Bool              unfolded;             /*    32     1 */
                _Bool              has_children;         /*    33     1 */
        };                                               /*    32     2 */

        /* XXX 6 bytes hole, try to pack */

        u64                        branch_count;         /*    40     8 */
        u64                        from_count;           /*    48     8 */
        u64                        predicted_count;      /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        u64                        abort_count;          /*    64     8 */
        u64                        cycles_count;         /*    72     8 */
        u64                        iter_count;           /*    80     8 */
        u64                        iter_cycles;          /*    88     8 */
        struct branch_type_stat    brtype_stat;          /*    96   232 */
        /* --- cacheline 5 boundary (320 bytes) was 8 bytes ago --- */
        const char  *              srcline;              /*   328     8 */
        struct list_head           list;                 /*   336    16 */

        /* size: 352, cachelines: 6, members: 13 */
        /* sum members: 346, holes: 1, sum holes: 6 */
        /* last cacheline: 32 bytes */
};
```

After:
```
struct callchain_list {
        u64                        ip;                   /*     0     8 */
        struct map_symbol          ms;                   /*     8    24 */
        struct {
                _Bool              unfolded;             /*    32     1 */
                _Bool              has_children;         /*    33     1 */
        };                                               /*    32     2 */

        /* XXX 6 bytes hole, try to pack */

        u64                        branch_count;         /*    40     8 */
        u64                        from_count;           /*    48     8 */
        u64                        predicted_count;      /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        u64                        abort_count;          /*    64     8 */
        u64                        cycles_count;         /*    72     8 */
        u64                        iter_count;           /*    80     8 */
        u64                        iter_cycles;          /*    88     8 */
        struct branch_type_stat *  brtype_stat;          /*    96     8 */
        const char  *              srcline;              /*   104     8 */
        struct list_head           list;                 /*   112    16 */

        /* size: 128, cachelines: 2, members: 13 */
        /* sum members: 122, holes: 1, sum holes: 6 */
};
```

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: liuwenyu <liuwenyu7@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20231024222353.3024098-10-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
10 months agoperf callchain: Make display use of branch_type_stat const
Ian Rogers [Tue, 24 Oct 2023 22:23:11 +0000 (15:23 -0700)]
perf callchain: Make display use of branch_type_stat const

Display code doesn't modify the branch_type_stat so switch uses to
const. This is done to aid refactoring struct callchain_list where
current the branch_type_stat is embedded even if not used.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: liuwenyu <liuwenyu7@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20231024222353.3024098-9-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
10 months agoperf offcpu: Add missed btf_free
Ian Rogers [Tue, 24 Oct 2023 22:23:10 +0000 (15:23 -0700)]
perf offcpu: Add missed btf_free

Caught by address/leak sanitizer.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: liuwenyu <liuwenyu7@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20231024222353.3024098-8-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
10 months agoperf threads: Remove unused dead thread list
Ian Rogers [Tue, 24 Oct 2023 22:23:09 +0000 (15:23 -0700)]
perf threads: Remove unused dead thread list

Commit 40826c45eb0b ("perf thread: Remove notion of dead threads")
removed dead threads but the list head wasn't removed. Remove it here.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: liuwenyu <liuwenyu7@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20231024222353.3024098-7-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
10 months agoperf hist: Add missing puts to hist__account_cycles
Ian Rogers [Tue, 24 Oct 2023 22:23:08 +0000 (15:23 -0700)]
perf hist: Add missing puts to hist__account_cycles

Caught using reference count checking on perf top with
"--call-graph=lbr". After this no memory leaks were detected.

Fixes: 57849998e2cd ("perf report: Add processing for cycle histograms")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: liuwenyu <liuwenyu7@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20231024222353.3024098-6-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
10 months agolibperf rc_check: Add RC_CHK_EQUAL
Ian Rogers [Tue, 24 Oct 2023 22:23:07 +0000 (15:23 -0700)]
libperf rc_check: Add RC_CHK_EQUAL

Comparing pointers with reference count checking is tricky to avoid a
SEGV. Add a convenience macro to simplify and use.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: liuwenyu <liuwenyu7@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20231024222353.3024098-5-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
10 months agolibperf rc_check: Make implicit enabling work for GCC
Ian Rogers [Tue, 24 Oct 2023 22:23:06 +0000 (15:23 -0700)]
libperf rc_check: Make implicit enabling work for GCC

Make the implicit REFCOUNT_CHECKING robust to when building with GCC.

Fixes: 9be6ab181b7b ("libperf rc_check: Enable implicitly with sanitizers")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: liuwenyu <liuwenyu7@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20231024222353.3024098-4-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
10 months agoperf machine: Avoid out of bounds LBR memory read
Ian Rogers [Tue, 24 Oct 2023 22:23:05 +0000 (15:23 -0700)]
perf machine: Avoid out of bounds LBR memory read

Running perf top with address sanitizer and "--call-graph=lbr" fails
due to reading sample 0 when no samples exist. Add a guard to prevent
this.

Fixes: e2b23483eb1d ("perf machine: Factor out lbr_callchain_add_lbr_ip()")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: liuwenyu <liuwenyu7@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20231024222353.3024098-3-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
10 months agoperf rwsem: Add debug mode that uses a mutex
Ian Rogers [Tue, 24 Oct 2023 22:23:04 +0000 (15:23 -0700)]
perf rwsem: Add debug mode that uses a mutex

Mutex error check will capture trying to take the lock recursively and
other problems that rwlock won't. At the expense of concurrency, adda
debug mode that uses a mutex in place of a rwsem.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: liuwenyu <liuwenyu7@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20231024222353.3024098-2-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
10 months agoperf build: Address stray '\' before # that is warned about since grep 3.8
Arnaldo Carvalho de Melo [Wed, 25 Oct 2023 11:23:05 +0000 (08:23 -0300)]
perf build: Address stray '\' before # that is warned about since grep 3.8

To address this grep 3.8 warning:

  grep: warning: stray \ before #

We needed to remove the '' around the grep expression and keep the \
before # so that it is escaped by the $(shell grep ...) and thus doesn't
get to grep.

We need that \ before the #, otherwise we get this:

  Makefile.perf:364: *** unterminated call to function 'shell': missing ')'.  Stop.

As everything after the # will be considered a comment.

Removing the single quotes needs some more escaping so that _some_ of
the escaped chars gets to grep, like the '\|' that becomes '\\\|´.

Running on debian:10, where there is no libtraceevent-devel available,
we get:

  Makefile.perf:367: *** PYTHON_EXT_SRCS= util/python.c ../lib/ctype.c util/cap.c util/evlist.c util/evsel.c util/evsel_fprintf.c util/perf_event_attr_fprintf.c util/cpumap.c util/memswap.c util/mmap.c util/namespaces.c ../lib/bitmap.c ../lib/find_bit.c ../lib/list_sort.c ../lib/hweight.c ../lib/string.c ../lib/vsprintf.c util/thread_map.c util/util.c util/cgroup.c util/parse-branch-options.c util/rblist.c util/counts.c util/print_binary.c util/strlist.c ../lib/rbtree.c util/string.c util/symbol_fprintf.c util/units.c util/affinity.c util/rwsem.c util/hashmap.c util/perf_regs.c util/fncache.c util/perf-regs-arch/perf_regs_aarch64.c util/perf-regs-arch/perf_regs_arm.c util/perf-regs-arch/perf_regs_csky.c util/perf-regs-arch/perf_regs_loongarch.c util/perf-regs-arch/perf_regs_mips.c util/perf-regs-arch/perf_regs_powerpc.c util/perf-regs-arch/perf_regs_riscv.c util/perf-regs-arch/perf_regs_s390.c util/perf-regs-arch/perf_regs_x86.c.  Stop.
  make[1]: *** [Makefile.perf:242: sub-make] Error 2

I.e. both the comments and the util/trace-event.c were removed.

When using:

msg := $(error PYTHON_EXT_SRCS=$(PYTHON_EXT_SRCS))

While on the more recent fedora:38, with the new grep and make packages
and libtraceevent-devel installed:

  Makefile.perf:367: *** PYTHON_EXT_SRCS= util/python.c ../lib/ctype.c util/cap.c util/evlist.c util/evsel.c util/evsel_fprintf.c util/perf_event_attr_fprintf.c util/cpumap.c util/memswap.c util/mmap.c util/namespaces.c ../lib/bitmap.c ../lib/find_bit.c ../lib/list_sort.c ../lib/hweight.c ../lib/string.c ../lib/vsprintf.c util/thread_map.c util/util.c util/cgroup.c util/parse-branch-options.c util/rblist.c util/counts.c util/print_binary.c util/strlist.c util/trace-event.c ../lib/rbtree.c util/string.c util/symbol_fprintf.c util/units.c util/affinity.c util/rwsem.c util/hashmap.c util/perf_regs.c util/fncache.c util/perf-regs-arch/perf_regs_aarch64.c util/perf-regs-arch/perf_regs_arm.c util/perf-regs-arch/perf_regs_csky.c util/perf-regs-arch/perf_regs_loongarch.c util/perf-regs-arch/perf_regs_mips.c util/perf-regs-arch/perf_regs_powerpc.c util/perf-regs-arch/perf_regs_riscv.c util/perf-regs-arch/perf_regs_s390.c util/perf-regs-arch/perf_regs_x86.c.  Stop.
  make[1]: *** [Makefile.perf:242: sub-make] Error 2
  make: *** [Makefile:113: install-bin] Error 2
  make: Leaving directory '/home/acme/git/perf-tools-next/tools/perf'
  $

I.e. only the comments were removed.

If we build it on the same fedora:38 system, but using NO_LIBTRACEEVENT=1

  $ make NO_LIBTRACEEVENT=1 CORESIGHT=1 O=/tmp/build/$(basename $PWD) -C tools/perf install-bin
  Makefile.perf:367: *** PYTHON_EXT_SRCS= util/python.c ../lib/ctype.c util/cap.c util/evlist.c util/evsel.c util/evsel_fprintf.c util/perf_event_attr_fprintf.c util/cpumap.c util/memswap.c util/mmap.c util/namespaces.c ../lib/bitmap.c ../lib/find_bit.c ../lib/list_sort.c ../lib/hweight.c ../lib/string.c ../lib/vsprintf.c util/thread_map.c util/util.c util/cgroup.c util/parse-branch-options.c util/rblist.c util/counts.c util/print_binary.c util/strlist.c ../lib/rbtree.c util/string.c util/symbol_fprintf.c util/units.c util/affinity.c util/rwsem.c util/hashmap.c util/perf_regs.c util/fncache.c util/perf-regs-arch/perf_regs_aarch64.c util/perf-regs-arch/perf_regs_arm.c util/perf-regs-arch/perf_regs_csky.c util/perf-regs-arch/perf_regs_loongarch.c util/perf-regs-arch/perf_regs_mips.c util/perf-regs-arch/perf_regs_powerpc.c util/perf-regs-arch/perf_regs_riscv.c util/perf-regs-arch/perf_regs_s390.c util/perf-regs-arch/perf_regs_x86.c.  Stop.
  make[1]: *** [Makefile.perf:242: sub-make] Error 2
  make: *** [Makefile:113: install-bin] Error 2
  make: Leaving directory '/home/acme/git/perf-tools-next/tools/perf'
  $

Both comments and the util/trace-event.c file removed.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/ZTj6mfM9UqY2DggC@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
10 months agoperf report: Fix hierarchy mode on pipe input
Namhyung Kim [Wed, 25 Oct 2023 00:31:21 +0000 (17:31 -0700)]
perf report: Fix hierarchy mode on pipe input

The hierarchy mode needs to setup output formats for each evsel.
Normally setup_sorting() handles this at the beginning, but it cannot
do that if data comes from a pipe since there's no evsel info before
reading the data.  And then perf report cannot process the samples
in hierarchy mode and think as if there's no sample.

Let's check the condition and setup the output formats after reading
data so that it can find evsels.

Before:

  $ ./perf record -o- true | ./perf report -i- --hierarchy -q
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.000 MB - ]
  Error:
  The - data has no samples!

After:

  $ ./perf record -o- true | ./perf report -i- --hierarchy -q
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.000 MB - ]
      94.76%        true
         94.76%        [kernel.kallsyms]
            94.76%        [k] filemap_fault
       5.24%        perf-ex
          5.24%        [kernel.kallsyms]
             5.06%        [k] __memset
             0.18%        [k] native_write_msr

Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20231025003121.2811738-1-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
10 months agoperf lock contention: Use per-cpu array map for spinlocks
Namhyung Kim [Fri, 20 Oct 2023 20:47:41 +0000 (13:47 -0700)]
perf lock contention: Use per-cpu array map for spinlocks

Currently lock contention timestamp is maintained in a hash map keyed by
pid.  That means it needs to get and release a map element (which is
proctected by spinlock!) on each contention begin and end pair.  This
can impact on performance if there are a lot of contention (usually from
spinlocks).

It used to go with task local storage but it had an issue on memory
allocation in some critical paths.  Although it's addressed in recent
kernels IIUC, the tool should support old kernels too.  So it cannot
simply switch to the task local storage at least for now.

As spinlocks create lots of contention and they disabled preemption
during the spinning, it can use per-cpu array to keep the timestamp to
avoid overhead in hashmap update and delete.

In contention_begin, it's easy to check the lock types since it can see
the flags.  But contention_end cannot see it.  So let's try to per-cpu
array first (unconditionally) if it has an active element (lock != 0).
Then it should be used and per-task tstamp map should not be used until
the per-cpu array element is cleared which means nested spinlock
contention (if any) was finished and it nows see (the outer) lock.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231020204741.1869520-3-namhyung@kernel.org
10 months agoperf lock contention: Check race in tstamp elem creation
Namhyung Kim [Fri, 20 Oct 2023 20:47:40 +0000 (13:47 -0700)]
perf lock contention: Check race in tstamp elem creation

When pelem is NULL, it'd create a new entry with zero data.  But it
might be preempted by IRQ/NMI just before calling bpf_map_update_elem()
then there's a chance to call it twice for the same pid.  So it'd be
better to use BPF_NOEXIST flag and check the return value to prevent
the race.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231020204741.1869520-2-namhyung@kernel.org
10 months agoperf lock contention: Clear lock addr after use
Namhyung Kim [Fri, 20 Oct 2023 20:47:39 +0000 (13:47 -0700)]
perf lock contention: Clear lock addr after use

It checks the current lock to calculated the delta of contention time.
The address is saved in the tstamp map which is allocated at begining of
contention and released at end of contention.

But it's possible for bpf_map_delete_elem() to fail.  In that case, the
element in the tstamp map kept for the current lock and it makes the
next contention for the same lock tracked incorrectly.  Specificially
the next contention begin will see the existing element for the task and
it'd just return.  Then the next contention end will see the element and
calculate the time using the timestamp for the previous begin.

This can result in a large value for two small contentions happened from
time to time.  Let's clear the lock address so that it can be updated
next time even if the bpf_map_delete_elem() failed.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231020204741.1869520-1-namhyung@kernel.org
10 months agoperf evsel: Rename evsel__increase_rlimit to rlimit__increase_nofile
Yang Jihong [Mon, 23 Oct 2023 03:31:44 +0000 (03:31 +0000)]
perf evsel: Rename evsel__increase_rlimit to rlimit__increase_nofile

evsel__increase_rlimit() helper does nothing with evsel, and description
of the functionality is inaccurate, rename it and move to util/rlimit.c.

By the way, fix a checkppatch warning about misplaced license tag:

  WARNING: Misplaced SPDX-License-Identifier tag - use line 1 instead
  #160: FILE: tools/perf/util/rlimit.h:3:
  /* SPDX-License-Identifier: LGPL-2.1 */

No functional change.

Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20231023033144.1011896-1-yangjihong1@huawei.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
10 months agoperf bench sched pipe: Add -G/--cgroups option
Namhyung Kim [Tue, 17 Oct 2023 20:23:42 +0000 (13:23 -0700)]
perf bench sched pipe: Add -G/--cgroups option

The -G/--cgroups option is to put sender and receiver in different
cgroups in order to measure cgroup context switch overheads.

Users need to make sure the cgroups exist and accessible.  The following
example should the effect of this change.  Please don't forget taskset
before the perf bench to measure cgroup switches properly.  Otherwise
each task would run on a different CPU and generate cgroup switches
regardless of this change.

  # perf stat -e context-switches,cgroup-switches \
  > taskset -c 0 perf bench sched pipe -l 10000 > /dev/null

   Performance counter stats for 'taskset -c 0 perf bench sched pipe -l 10000':

              20,001      context-switches
                   2      cgroup-switches

         0.053449651 seconds time elapsed

         0.011286000 seconds user
         0.041869000 seconds sys

  # perf stat -e context-switches,cgroup-switches \
  > taskset -c 0 perf bench sched pipe -l 10000 -G AAA,BBB > /dev/null

   Performance counter stats for 'taskset -c 0 perf bench sched pipe -l 10000 -G AAA,BBB':

              20,001      context-switches
              20,001      cgroup-switches

         0.052768627 seconds time elapsed

         0.006284000 seconds user
         0.046266000 seconds sys

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20231017202342.1353124-1-namhyung@kernel.org
10 months agoperf test: Skip CoreSight tests if cs_etm// event is not available
Michael Petlan [Thu, 19 Oct 2023 09:11:37 +0000 (11:11 +0200)]
perf test: Skip CoreSight tests if cs_etm// event is not available

CoreSight might be not available, in such case, skip the tests.

Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Carsten Haitzler <carsten.haitzler@arm.com>
Cc: vmolnaro@redhat.com
Link: https://lore.kernel.org/r/20231019091137.22525-1-mpetlan@redhat.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf data: Increase RLIMIT_NOFILE limit when open too many files in perf_data__create...
Yang Jihong [Fri, 13 Oct 2023 07:59:45 +0000 (07:59 +0000)]
perf data: Increase RLIMIT_NOFILE limit when open too many files in perf_data__create_dir()

If using parallel threads to collect data, perf record needs at least 6 fds
per CPU. (one for sys_perf_event_open, four for pipe msg and ack of the
pipe, see record__thread_data_open_pipes(), and one for open perf.data.XXX)
For an environment with more than 100 cores, if perf record uses both
`-a` and `--threads` options, it is easy to exceed the upper limit of the
file descriptor number, when we run out of them try to increase the limits.

Before:
  $ ulimit -n
  1024
  $ lscpu | grep 'On-line CPU(s)'
  On-line CPU(s) list:                0-159
  $ perf record --threads -a sleep 1
  Failed to create data directory: Too many open files

After:
  $ ulimit -n
  1024
  $ lscpu | grep 'On-line CPU(s)'
  On-line CPU(s) list:                0-159
  $ perf record --threads -a sleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.394 MB perf.data (1576 samples) ]

Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20231013075945.698874-1-yangjihong1@huawei.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf vendor events: Update PMC used in PM_RUN_INST_CMPL event for power10 platform
Kajol Jain [Mon, 16 Oct 2023 14:31:10 +0000 (20:01 +0530)]
perf vendor events: Update PMC used in PM_RUN_INST_CMPL event for power10 platform

The CPI_STALL_RATIO metric group can be used to present the high
level CPI stall breakdown metrics in powerpc, which will show:

- DISPATCH_STALL_CPI ( Dispatch stall cycles per insn )
- ISSUE_STALL_CPI ( Issue stall cycles per insn )
- EXECUTION_STALL_CPI ( Execution stall cycles per insn )
- COMPLETION_STALL_CPI ( Completion stall cycles per insn )

Commit cf26e043c2a9 ("perf vendor events power10: Add JSON
metric events to present CPI stall cycles in powerpc)" which added
the CPI_STALL_RATIO metric group, also modified
the PMC value used in PM_RUN_INST_CMPL event from PMC4 to PMC5,
to avoid multiplexing of events.
But that got revert in recent changes. Fix this issue by changing
back the PMC value used in PM_RUN_INST_CMPL to PMC5.

Result with the fix:

 ./perf stat --metric-no-group -M CPI_STALL_RATIO <workload>

 Performance counter stats for 'workload':

        68,745,426      PM_CMPL_STALL                    #     0.21 COMPLETION_STALL_CPI
         7,692,827      PM_ISSUE_STALL                   #     0.02 ISSUE_STALL_CPI
       322,638,223      PM_RUN_INST_CMPL                 #     0.05 DISPATCH_STALL_CPI
                                                  #     0.48 EXECUTION_STALL_CPI
        16,858,553      PM_DISP_STALL_CYC
       153,880,133      PM_EXEC_STALL

       0.089774592 seconds time elapsed

"--metric-no-group" is used for forcing PM_RUN_INST_CMPL to be scheduled
in all group for more accuracy.

Fixes: 7d473f475b2a ("perf vendor events: Move JSON/events to appropriate files for power10 platform")
Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Disha Goel<disgoel@linux.ibm.com>
Cc: maddy@linux.ibm.com
Link: https://lore.kernel.org/r/20231016143110.244255-1-kjain@linux.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf trace: Use the right bpf_probe_read(_str) variant for reading user data
Thomas Richter [Thu, 19 Oct 2023 08:26:42 +0000 (10:26 +0200)]
perf trace: Use the right bpf_probe_read(_str) variant for reading user data

Perf test case 111 Check open filename arg using perf trace + vfs_getname
fails on s390. This is caused by a failing function
bpf_probe_read() in file util/bpf_skel/augmented_raw_syscalls.bpf.c.

The root cause is the lookup by address. Function bpf_probe_read()
is used. This function works only for architectures
with ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE.

On s390 is not possible to determine from the address to which
address space the address belongs to (user or kernel space).

Replace bpf_probe_read() by bpf_probe_read_kernel()
and bpf_probe_read_str() by bpf_probe_read_user_str() to
explicity specify the address space the address refers to.

Output before:
 # ./perf trace -eopen,openat -- touch /tmp/111
 libbpf: prog 'sys_enter': BPF program load failed: Invalid argument
 libbpf: prog 'sys_enter': -- BEGIN PROG LOAD LOG --
 reg type unsupported for arg#0 function sys_enter#75
 0: R1=ctx(off=0,imm=0) R10=fp0
 ; int sys_enter(struct syscall_enter_args *args)
 0: (bf) r6 = r1           ; R1=ctx(off=0,imm=0) R6_w=ctx(off=0,imm=0)
 ; return bpf_get_current_pid_tgid();
 1: (85) call bpf_get_current_pid_tgid#14      ; R0_w=scalar()
 2: (63) *(u32 *)(r10 -8) = r0 ; R0_w=scalar() R10=fp0 fp-8=????mmmm
 3: (bf) r2 = r10              ; R2_w=fp0 R10=fp0
 ;
 .....
 lines deleted here
 .....
 23: (bf) r3 = r6              ; R3_w=ctx(off=0,imm=0) R6=ctx(off=0,imm=0)
 24: (85) call bpf_probe_read#4
 unknown func bpf_probe_read#4
 processed 23 insns (limit 1000000) max_states_per_insn 0 \
 total_states 2 peak_states 2 mark_read 2
 -- END PROG LOAD LOG --
 libbpf: prog 'sys_enter': failed to load: -22
 libbpf: failed to load object 'augmented_raw_syscalls_bpf'
 libbpf: failed to load BPF skeleton 'augmented_raw_syscalls_bpf': -22
 ....

Output after:
 # ./perf test -Fv 111
 111: Check open filename arg using perf trace + vfs_getname          :
 --- start ---
     1.085 ( 0.011 ms): touch/320753 openat(dfd: CWD, filename: \
"/tmp/temporary_file.SWH85", \
flags: CREAT|NOCTTY|NONBLOCK|WRONLY, mode: IRUGO|IWUGO) = 3
 ---- end ----
 Check open filename arg using perf trace + vfs_getname: Ok
 #

Test with the sleep command shows:
Output before:
 # ./perf trace -e *sleep sleep 1.234567890
     0.000 (1234.681 ms): sleep/63114 clock_nanosleep(rqtp: \
         { .tv_sec: 0, .tv_nsec: 0 }, rmtp: 0x3ffe0979720) = 0
 #

Output after:
 # ./perf trace -e *sleep sleep 1.234567890
     0.000 (1234.686 ms): sleep/64277 clock_nanosleep(rqtp: \
         { .tv_sec: 1, .tv_nsec: 234567890 }, rmtp: 0x3fff3df9ea0) = 0
 #

Fixes: 14e4b9f4289a ("perf trace: Raw augmented syscalls fix libbpf 1.0+ compatibility")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Co-developed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: gor@linux.ibm.com
Cc: hca@linux.ibm.com
Cc: sumanthk@linux.ibm.com
Cc: svens@linux.ibm.com
Link: https://lore.kernel.org/r/20231019082642.3286650-1-tmricht@linux.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf tools: Do not ignore the default vmlinux.h
Namhyung Kim [Tue, 10 Oct 2023 23:42:47 +0000 (16:42 -0700)]
perf tools: Do not ignore the default vmlinux.h

The recent change made it possible to generate vmlinux.h from BTF and
to ignore the file.  But we also have a minimal vmlinux.h that will be
used by default.  It should not be ignored by GIT.

Fixes: b7a2d774c9c5 ("perf build: Add ability to build with a generated vmlinux.h")
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202310110451.rvdUZJEY-lkp@intel.com/
Cc: oe-kbuild-all@lists.linux.dev
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agotools/build: Fix -s detection code in tools/scripts/Makefile.include
Jiri Olsa [Sun, 8 Oct 2023 21:22:51 +0000 (23:22 +0200)]
tools/build: Fix -s detection code in tools/scripts/Makefile.include

As Dmitry described in [1] changelog the current way of detecting
-s option is broken for new make.

Changing the tools/build -s option detection the same way as it was
fixed for root Makefile in [1].

[1] 4bf73588165b ("kbuild: Port silent mode detection to future gnu make.")

Cc: Dmitry Goncharov <dgoncharov@users.sf.net>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: KP Singh <kpsingh@chromium.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Stanislav Fomichev <sdf@google.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Quentin Monnet <quentin@isovalent.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: bpf@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
Link: https://lore.kernel.org/r/20231008212251.236023-3-jolsa@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agotools/build: Fix -s detection code in tools/build/Makefile.build
Jiri Olsa [Sun, 8 Oct 2023 21:22:50 +0000 (23:22 +0200)]
tools/build: Fix -s detection code in tools/build/Makefile.build

As Dmitry described in [1] changelog the current way of detecting
-s option is broken for new make.

Changing the tools/build -s option detection the same way as it was
fixed for root Makefile in [1].

[1] 4bf73588165b ("kbuild: Port silent mode detection to future gnu make.")

Cc: Dmitry Goncharov <dgoncharov@users.sf.net>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: KP Singh <kpsingh@chromium.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Stanislav Fomichev <sdf@google.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Quentin Monnet <quentin@isovalent.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: bpf@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
Link: https://lore.kernel.org/r/20231008212251.236023-2-jolsa@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf: script: fix missing ',' for fields option
Changbin Du [Tue, 17 Oct 2023 01:55:24 +0000 (09:55 +0800)]
perf: script: fix missing ',' for fields option

A comma is missed at the end of line.

Signed-off-by: Changbin Du <changbin.du@huawei.com>
Link: https://lore.kernel.org/r/20231017015524.797065-1-changbin.du@huawei.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf tests: Fix shellcheck warning in stat_all_metricgroups
Athira Rajeev [Fri, 13 Oct 2023 07:30:21 +0000 (13:00 +0530)]
perf tests: Fix shellcheck warning in stat_all_metricgroups

Running shellcheck on stat_all_metricgroups.sh reports
below warning:

 In ./tests/shell/stat_all_metricgroups.sh line 7:
 function ParanoidAndNotRoot()
 ^-- SC2112: 'function' keyword is non-standard. Delete it.

As per the format, "function" is a non-standard keyword that
can be used to declare functions. Fix this by removing the
"function" keyword from ParanoidAndNotRoot function

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: kjain@linux.ibm.com
Cc: maddy@linux.ibm.com
Cc: disgoel@linux.vnet.ibm.com
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20231013073021.99794-4-atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf tests: Fix shellcheck warning in record_sideband.sh
Athira Rajeev [Fri, 13 Oct 2023 07:30:20 +0000 (13:00 +0530)]
perf tests: Fix shellcheck warning in record_sideband.sh

Running shellcheck on record_sideband.sh throws below
warning:

In tests/shell/record_sideband.sh line 25:
  if ! perf record -o ${perfdata} -BN --no-bpf-event -C $1 true 2>&1 >/dev/null
    ^--^ SC2069: To redirect stdout+stderr, 2>&1 must be last (or use '{ cmd > file; } 2>&1' to clarify).

This shows shellcheck warning SC2069 where the redirection
order needs to be fixed. Use "cmd > /dev/null 2>&1" to fix
the redirection of perf record output

Fixes: 23b97c7ee963 ("perf test: Add test case for record sideband events")
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: disgoel@linux.vnet.ibm.com
Link: https://lore.kernel.org/r/20231013073021.99794-3-atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf tests: Ignore shellcheck warning in lock_contention
Athira Rajeev [Fri, 13 Oct 2023 07:30:19 +0000 (13:00 +0530)]
perf tests: Ignore shellcheck warning in lock_contention

Running shellcheck on lock_contention.sh generates below
warning

In tests/shell/lock_contention.sh line 36:
   if [ `nproc` -lt 4 ]; then
  ^-----^ SC2046: Quote this to prevent word splitting.

Here since nproc will generate a single word output
and there is no possibility of word splitting, this
warning can be ignored. Use exception for this with
"disable" option in shellcheck. This warning is observed
after commit:
"commit 29441ab3a30a ("perf test lock_contention.sh: Skip test
if not enough CPUs")"

Fixes: 29441ab3a30a ("perf test lock_contention.sh: Skip test if not enough CPUs")
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: maddy@linux.ibm.com
Cc: disgoel@linux.vnet.ibm.com
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20231013073021.99794-2-atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agotools/perf/arch/powerpc: Fix the CPU ID const char* value by adding 0x prefix
Athira Rajeev [Mon, 9 Oct 2023 05:00:52 +0000 (10:30 +0530)]
tools/perf/arch/powerpc: Fix the CPU ID const char* value by adding 0x prefix

Simple expression parser test fails in powerpc as below:

    4: Simple expression parser
    test child forked, pid 170385
    Using CPUID 004e2102
    division by zero
    syntax error
    syntax error
    FAILED tests/expr.c:65 parse test failed
    test child finished with -1
    Simple expression parser: FAILED!

This is observed after commit:
'commit 9d5da30e4ae9 ("perf jevents: Add a new expression builtin strcmp_cpuid_str()")'

With this commit, a new expression builtin strcmp_cpuid_str
got added. This function takes an 'ID' type value, which is
a string. So expression parse for strcmp_cpuid_str expects
const char * as cpuid value type. In case of powerpc, CPU IDs
are numbers. Hence it doesn't get interpreted correctly by
bison parser. Example in case of power9, cpuid string returns
as: 004e2102

cpuid of string type is expected in two cases:
1. char *get_cpuid_str(struct perf_pmu *pmu __maybe_unused);

   Testcase "tests/expr.c" uses "perf_pmu__getcpuid" which calls
   get_cpuid_str to get the cpuid string.

2. cpuid field in  :struct pmu_events_map

   struct pmu_events_map {
           const char *arch;
   const char *cpuid;

   Here cpuid field is used in "perf_pmu__find_events_table"
   function as "strcmp_cpuid_str(map->cpuid, cpuid)". The
   value for cpuid field is picked from mapfile.csv.

Fix the mapfile.csv and get_cpuid_str function to prefix
cpuid with 0x so that it gets correctly interpreted by
the bison parser

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Disha Goel<disgoel@linux.ibm.com>
Cc: kjain@linux.ibm.com
Cc: maddy@linux.ibm.com
Cc: disgoel@linux.vnet.ibm.com
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20231009050052.64935-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf cs-etm: Respect timestamp option
Leo Yan [Sat, 14 Oct 2023 07:41:59 +0000 (15:41 +0800)]
perf cs-etm: Respect timestamp option

When users pass the option '--timestamp' or '-T' in the record command,
all events will set the PERF_SAMPLE_TIME bit in the attribution.  In
this case, the AUX event will record the kernel timestamp, but it
doesn't mean Arm CoreSight enables timestamp packets in its hardware
tracing.

If the option '--timestamp' or '-T' is set, this patch always enables
Arm CoreSight timestamp, as a result, the bit 28 in event's config is to
be set.

Before:

  # perf record -e cs_etm// --per-thread --timestamp -- ls
  # perf script --header-only
  ...
  # event : name = cs_etm//, , id = { 69 }, type = 12, size = 136,
  config = 0, { sample_period, sample_freq } = 1,
  sample_type = IP|TID|TIME|CPU|IDENTIFIER, read_format = ID|LOST,
  disabled = 1, enable_on_exec = 1, sample_id_all = 1, exclude_guest = 1
  ...

After:

  # perf record -e cs_etm// --per-thread --timestamp -- ls
  # perf script --header-only
  ...
  # event : name = cs_etm//, , id = { 49 }, type = 12, size = 136,
  config = 0x10000000, { sample_period, sample_freq } = 1,
  sample_type = IP|TID|TIME|CPU|IDENTIFIER, read_format = ID|LOST,
  disabled = 1, enable_on_exec = 1, sample_id_all = 1, exclude_guest = 1
  ...

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: James Clark <james.clark@arm.com>
Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Link: https://lore.kernel.org/r/20231014074159.1667880-3-leo.yan@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf cs-etm: Validate timestamp tracing in per-thread mode
Leo Yan [Sat, 14 Oct 2023 07:41:58 +0000 (15:41 +0800)]
perf cs-etm: Validate timestamp tracing in per-thread mode

So far, it's impossible to validate timestamp trace in Arm CoreSight when
the perf is in the per-thread mode.  E.g. for the command:

  perf record -e cs_etm/timestamp/ --per-thread -- ls

The command enables config 'timestamp' for 'cs_etm' event in the
per-thread mode.  In this case, the function cs_etm_validate_config()
directly bails out and skips validation.

Given profiled process can be scheduled on any CPUs in the per-thread
mode, this patch validates timestamp tracing for all CPUs when detect
the CPU map is empty.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: James Clark <james.clark@arm.com>
Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Link: https://lore.kernel.org/r/20231014074159.1667880-2-leo.yan@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf pmu: Lazily compute default config
Ian Rogers [Thu, 12 Oct 2023 17:56:45 +0000 (10:56 -0700)]
perf pmu: Lazily compute default config

The default config is computed during creation of the PMU and may do
things like scanning sysfs, when the PMU may just be used as part of
scanning. Change default_config to perf_event_attr_init_default, a
callback that is used when a default config needs initializing. This
avoids holding onto the memory for a perf_event_attr and copying.

On a tigerlake laptop running the pmu-scan benchmark:

Before:
Running 'internals/pmu-scan' benchmark:
Computing performance of sysfs PMU event scan for 100 times
  Average core PMU scanning took: 28.780 usec (+- 0.503 usec)
  Average PMU scanning took: 283.480 usec (+- 18.471 usec)
Number of openat syscalls: 30,227

After:
Running 'internals/pmu-scan' benchmark:
Computing performance of sysfs PMU event scan for 100 times
  Average core PMU scanning took: 27.880 usec (+- 0.169 usec)
  Average PMU scanning took: 245.260 usec (+- 15.758 usec)
Number of openat syscalls: 28,914

Over 3 runs it is a nearly 12% reduction in execution time and a 4.3%
of openat calls.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Link: https://lore.kernel.org/r/20231012175645.1849503-8-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf pmu-events: Remember the perf_events_map for a PMU
Ian Rogers [Thu, 12 Oct 2023 17:56:44 +0000 (10:56 -0700)]
perf pmu-events: Remember the perf_events_map for a PMU

strcmp_cpuid_str performs regular expression comparisons and so per
CPUID linear searches over the perf_events_map are expensive. Add a
helper function called map_for_pmu that does the search but also
caches the map specific to a PMU. As the PMU may differ, also cache
the CPUID string so that PMUs with the same CPUID string don't require
the linear search and regular expression comparisons. This speeds
loading PMUs as the search is done once per PMU to find the
appropriate tables.

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Yang Jihong <yangjihong1@huawei.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Link: https://lore.kernel.org/r/20231012175645.1849503-7-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf pmu: Const-ify perf_pmu__config_terms
Ian Rogers [Thu, 12 Oct 2023 17:56:43 +0000 (10:56 -0700)]
perf pmu: Const-ify perf_pmu__config_terms

Add const to related APIs, this is so they can be used to default
initialize a perf_event_attr from a const pmu.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Link: https://lore.kernel.org/r/20231012175645.1849503-6-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf pmu: Const-ify file APIs
Ian Rogers [Thu, 12 Oct 2023 17:56:42 +0000 (10:56 -0700)]
perf pmu: Const-ify file APIs

File APIs don't alter the struct pmu so allow const ones to be passed.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Link: https://lore.kernel.org/r/20231012175645.1849503-5-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf arm-spe: Move PMU initialization from default config code
Ian Rogers [Thu, 12 Oct 2023 17:56:41 +0000 (10:56 -0700)]
perf arm-spe: Move PMU initialization from default config code

Avoid setting PMU values in arm_spe_pmu_default_config, move to
perf_pmu__arch_init.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Leo Yan <leo.yan@linaro.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Link: https://lore.kernel.org/r/20231012175645.1849503-4-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf intel-pt: Move PMU initialization from default config code
Ian Rogers [Thu, 12 Oct 2023 17:56:40 +0000 (10:56 -0700)]
perf intel-pt: Move PMU initialization from default config code

Avoid setting PMU values in intel_pt_pmu_default_config, move to
perf_pmu__arch_init.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Link: https://lore.kernel.org/r/20231012175645.1849503-3-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf pmu: Rename perf_pmu__get_default_config to perf_pmu__arch_init
Ian Rogers [Thu, 12 Oct 2023 17:56:39 +0000 (10:56 -0700)]
perf pmu: Rename perf_pmu__get_default_config to perf_pmu__arch_init

Assign default_config as part of the init. perf_pmu__get_default_config
was doing more than just getting the default config and so this is
intended to better align with the code.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Link: https://lore.kernel.org/r/20231012175645.1849503-2-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf intel-pt: Prefer get_unaligned_le64 to memcpy_le64
Adrian Hunter [Thu, 5 Oct 2023 19:04:51 +0000 (22:04 +0300)]
perf intel-pt: Prefer get_unaligned_le64 to memcpy_le64

Use get_unaligned_le64() instead of memcpy_le64(..., 8) because it produces
simpler code.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20231005190451.175568-6-adrian.hunter@intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf intel-pt: Use get_unaligned_le16() etc
Adrian Hunter [Thu, 5 Oct 2023 19:04:50 +0000 (22:04 +0300)]
perf intel-pt: Use get_unaligned_le16() etc

Avoid unaligned access by using get_unaligned_le16(), get_unaligned_le32()
and get_unaligned_le64().

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20231005190451.175568-5-adrian.hunter@intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf intel-pt: Use existing definitions of le16_to_cpu() etc
Adrian Hunter [Thu, 5 Oct 2023 19:04:49 +0000 (22:04 +0300)]
perf intel-pt: Use existing definitions of le16_to_cpu() etc

Use definitions from tools/include/linux/kernel.h

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20231005190451.175568-4-adrian.hunter@intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf intel-pt: Simplify intel_pt_get_vmcs()
Adrian Hunter [Thu, 5 Oct 2023 19:04:48 +0000 (22:04 +0300)]
perf intel-pt: Simplify intel_pt_get_vmcs()

Simplify and remove unnecessary constant expressions.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20231005190451.175568-3-adrian.hunter@intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf tools: Add get_unaligned_leNN()
Adrian Hunter [Thu, 5 Oct 2023 19:04:47 +0000 (22:04 +0300)]
perf tools: Add get_unaligned_leNN()

Add get_unaligned_le16(), get_unaligned_le32 and get_unaligned_le64, same
as include/asm-generic/unaligned.h. And add include/asm-generic/unaligned.h
to check-headers.sh bringing tools/include/asm-generic/unaligned.h up to
date so that the kernel and tools versions match.

Use diagnostic pragmas to ignore -Wpacked used by perf build.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20231005190451.175568-2-adrian.hunter@intel.com
Link: https://lore.kernel.org/r/20231010142234.20061-1-adrian.hunter@intel.com
[ squashed check-header.sh addition ]
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf cs-etm: Fix incorrect or missing decoder for raw trace
Besar Wicaksono [Tue, 10 Oct 2023 23:48:03 +0000 (18:48 -0500)]
perf cs-etm: Fix incorrect or missing decoder for raw trace

The decoder creation for raw trace uses metadata from the first CPU.
On per-cpu mode, this metadata is incorrectly used for every decoder.
On per-process/per-thread traces, the first CPU is CPU0. If CPU0 trace
is not enabled, its metadata will be marked unused and the decoder is
not created. Perf report dump skips the decoding part because the
decoder is missing.

To fix this, use metadata of the CPU associated with sample object.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Reviewed-by: James Clark <james.clark@arm.com>
Cc: suzuki.poulose@arm.com
Cc: mike.leach@linaro.org
Cc: jonathanh@nvidia.com
Cc: rwiley@nvidia.com
Cc: treding@nvidia.com
Cc: vsethi@nvidia.com
Cc: ywan@nvidia.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Cc: linux-tegra@vger.kernel.org
Link: https://lore.kernel.org/r/20231010234803.5419-1-bwicaksono@nvidia.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf bpf_counter: Fix a few memory leaks
Ian Rogers [Mon, 9 Oct 2023 18:39:20 +0000 (11:39 -0700)]
perf bpf_counter: Fix a few memory leaks

Memory leaks were detected by clang-tidy.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: llvm@lists.linux.dev
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231009183920.200859-20-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf header: Fix various error path memory leaks
Ian Rogers [Mon, 9 Oct 2023 18:39:19 +0000 (11:39 -0700)]
perf header: Fix various error path memory leaks

Memory leaks were detected by clang-tidy.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: llvm@lists.linux.dev
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231009183920.200859-19-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf trace-event-info: Avoid passing NULL value to closedir
Ian Rogers [Mon, 9 Oct 2023 18:39:18 +0000 (11:39 -0700)]
perf trace-event-info: Avoid passing NULL value to closedir

If opendir failed then closedir was passed NULL which is
erroneous. Caught by clang-tidy.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: llvm@lists.linux.dev
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231009183920.200859-18-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agotools api: Avoid potential double free
Ian Rogers [Mon, 9 Oct 2023 18:39:17 +0000 (11:39 -0700)]
tools api: Avoid potential double free

io__getline will free the line on error but it doesn't clear the out
argument. This may lead to the line being freed twice, like in
tools/perf/util/srcline.c as detected by clang-tidy.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: llvm@lists.linux.dev
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231009183920.200859-17-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf parse-events: Fix unlikely memory leak when cloning terms
Ian Rogers [Mon, 9 Oct 2023 18:39:16 +0000 (11:39 -0700)]
perf parse-events: Fix unlikely memory leak when cloning terms

Add missing free on an error path as detected by clang-tidy.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: llvm@lists.linux.dev
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231009183920.200859-16-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf lock: Fix a memory leak on an error path
Ian Rogers [Mon, 9 Oct 2023 18:39:15 +0000 (11:39 -0700)]
perf lock: Fix a memory leak on an error path

If a memory allocation fails then the strdup-ed string needs
freeing. Detected by clang-tidy.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: llvm@lists.linux.dev
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231009183920.200859-15-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf svghelper: Avoid memory leak
Ian Rogers [Mon, 9 Oct 2023 18:39:14 +0000 (11:39 -0700)]
perf svghelper: Avoid memory leak

On success path the sib_core and sib_thr values weren't being
freed. Detected by clang-tidy.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: llvm@lists.linux.dev
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231009183920.200859-14-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf hists browser: Avoid potential NULL dereference
Ian Rogers [Mon, 9 Oct 2023 18:39:13 +0000 (11:39 -0700)]
perf hists browser: Avoid potential NULL dereference

On other code paths browser->he_selection is NULL checked, add a
missing case reported by clang-tidy.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: llvm@lists.linux.dev
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231009183920.200859-13-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf hists browser: Reorder variables to reduce padding
Ian Rogers [Mon, 9 Oct 2023 18:39:12 +0000 (11:39 -0700)]
perf hists browser: Reorder variables to reduce padding

Address clang-tidy warning:
```
tools/perf/ui/browsers/hists.c:2416:8: warning: Excessive padding in 'struct popup_action' (8 padding bytes, where 0 is optimal).
Optimal fields order:
time,
thread,
evsel,
fn,
ms,
socket,
rstype,
```

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: llvm@lists.linux.dev
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231009183920.200859-12-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf dlfilter: Be defensive against potential NULL dereference
Ian Rogers [Mon, 9 Oct 2023 18:39:11 +0000 (11:39 -0700)]
perf dlfilter: Be defensive against potential NULL dereference

In the unlikely case of having a symbol without a mapping, avoid a
NULL dereference that clang-tidy warns about.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: llvm@lists.linux.dev
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231009183920.200859-11-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf mem-events: Avoid uninitialized read
Ian Rogers [Mon, 9 Oct 2023 18:39:10 +0000 (11:39 -0700)]
perf mem-events: Avoid uninitialized read

pmu should be initialized to NULL before perf_pmus__scan loop. Fix and
shrink the scope of pmu at the same time. Issue detected by clang-tidy.

Fixes: 5752c20f3787 ("perf mem: Scan all PMUs instead of just core ones")
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: llvm@lists.linux.dev
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231009183920.200859-10-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf jitdump: Avoid memory leak
Ian Rogers [Mon, 9 Oct 2023 18:39:09 +0000 (11:39 -0700)]
perf jitdump: Avoid memory leak

jit_repipe_unwinding_info is called in a loop by jit_process_dump,
avoid leaking unwinding_data by free-ing before overwriting. Error
detected by clang-tidy.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: llvm@lists.linux.dev
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231009183920.200859-9-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf env: Remove unnecessary NULL tests
Ian Rogers [Mon, 9 Oct 2023 18:39:08 +0000 (11:39 -0700)]
perf env: Remove unnecessary NULL tests

clang-tidy was warning:
```
util/env.c:334:23: warning: Access to field 'nr_pmu_mappings' results in a dereference of a null pointer (loaded from variable 'env') [clang-analyzer-core.NullDereference]
        env->nr_pmu_mappings = pmu_num;
```

As functions are called potentially when !env was true. This condition
could never be true as it would produce a segv, so remove the
unnecessary NULL tests and silence clang-tidy.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: llvm@lists.linux.dev
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231009183920.200859-8-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf buildid-cache: Fix use of uninitialized value
Ian Rogers [Mon, 9 Oct 2023 18:39:07 +0000 (11:39 -0700)]
perf buildid-cache: Fix use of uninitialized value

The buildid filename is first determined and then from this the
buildid read. If getting the filename fails then the buildid will be
used for a later memcmp uninitialized. Detected by clang-tidy.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: llvm@lists.linux.dev
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231009183920.200859-7-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf bench uprobe: Fix potential use of memory after free
Ian Rogers [Mon, 9 Oct 2023 18:39:06 +0000 (11:39 -0700)]
perf bench uprobe: Fix potential use of memory after free

Found by clang-tidy:
```
bench/uprobe.c:98:3: warning: Use of memory after it is freed [clang-analyzer-unix.Malloc]
                bench_uprobe_bpf__destroy(skel);
```

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: llvm@lists.linux.dev
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231009183920.200859-6-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agorun-clang-tools: Add pass through checks and and header-filter arguments
Ian Rogers [Mon, 9 Oct 2023 18:39:04 +0000 (11:39 -0700)]
run-clang-tools: Add pass through checks and and header-filter arguments

Add a -checks argument to allow the checks passed to the clang-tool to
be set on the command line.

Add a pass through -header-filter option.

Don't run analysis on non-C or CPP files.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: llvm@lists.linux.dev
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Tom Rix <trix@redhat.com>
Cc: bpf@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
Link: https://lore.kernel.org/r/20231009183920.200859-4-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agogen_compile_commands: Sort output compile commands by file name
Ian Rogers [Mon, 9 Oct 2023 18:39:03 +0000 (11:39 -0700)]
gen_compile_commands: Sort output compile commands by file name

Make the output more stable and deterministic.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: llvm@lists.linux.dev
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Tom Rix <trix@redhat.com>
Cc: bpf@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
Link: https://lore.kernel.org/r/20231009183920.200859-3-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agogen_compile_commands: Allow the line prefix to still be cmd_
Ian Rogers [Mon, 9 Oct 2023 18:39:02 +0000 (11:39 -0700)]
gen_compile_commands: Allow the line prefix to still be cmd_

Builds in tools still use the cmd_ prefix in .cmd files, so don't
require the saved part. Name the groups in the line pattern match so
that changing the regular expression is more robust and works with the
addition of a new match group.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: llvm@lists.linux.dev
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Tom Rix <trix@redhat.com>
Cc: bpf@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
Link: https://lore.kernel.org/r/20231009183920.200859-2-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf parse-events: Fix for term values that are raw events
Ian Rogers [Thu, 28 Sep 2023 00:44:31 +0000 (17:44 -0700)]
perf parse-events: Fix for term values that are raw events

Raw events can be strings like 'r0xead' but the 0x is optional so they
can also be 'read'. On IcelakeX uncore_imc_free_running has an event
called 'read' which may be programmed as:
```
$ perf stat -e 'uncore_imc_free_running/event=read/' -a sleep 1
```
However, the PE_RAW type isn't allowed on the right of a term, even
though in this case we just want to interpret it as a string. This
leads to the following error on IcelakeX:
```
$ perf stat -e 'uncore_imc_free_running/event=read/' -a sleep 1
event syntax error: '..nning/event=read/'
                                  \___ parser error
Run 'perf list' for a list of valid events

 Usage: perf stat [<options>] [<command>]

    -e, --event <event> event selector. use 'perf list' to list available events
```
Fix this by allowing raw types on the right of terms and treat them as
strings, just as is already done for PE_LEGACY_CACHE. Make this
consistent by just entirely removing name_or_legacy and always using
name_or_raw that covers all three cases.

Fixes: 6fd1e5191591 ("perf parse-events: Support PMUs for legacy cache events")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20230928004431.1926969-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf build: Add missing comment about NO_LIBTRACEEVENT=1
Arnaldo Carvalho de Melo [Thu, 5 Oct 2023 13:46:26 +0000 (10:46 -0300)]
perf build: Add missing comment about NO_LIBTRACEEVENT=1

By default perf will fail the build if the development files for
libtraceevent are not available.

To build perf without libtraceevent support, disabling several features
such as 'perf trace', one needs to add NO_LIBTRACEVENT=1 to the make
command line.

Add the missing comments about that to the tools/perf/Makefile.perf
file, just like all the other such command line toggles.

Fixes: 378ef0f5d9d7f465 ("perf build: Use libtraceevent from the system")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/ZR6+MhXtLnv6ow6E@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf symbols: Add 'intel_idle_ibrs' to the list of idle symbols
Arnaldo Carvalho de Melo [Thu, 5 Oct 2023 13:29:38 +0000 (10:29 -0300)]
perf symbols: Add 'intel_idle_ibrs' to the list of idle symbols

This is a longstanding to do list entry: we need a way to see that a
sample took place while in idle state, as the current way to do it is
to infer that by the name of the functions that in such state have
more samples, IOW: a hack.

Maybe we can do flip a bit in samples that take place inside the
enter/exit idle section in do_idle()?

But till then, add one more :-\

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/ZR66Qgbcltt+zG7F@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf parse-events: Avoid erange from hex numbers
Ian Rogers [Thu, 7 Sep 2023 21:05:33 +0000 (14:05 -0700)]
perf parse-events: Avoid erange from hex numbers

We specify that a "num_hex" comprises 1 or more digits, however, that
allows strtoull to fail with ERANGE. Limit the number of hex digits to
being between 1 and 16.

Before:
```
$ perf stat -e 'cpu/rE7574c47490475745/' true
perf: util/parse-events.c:215: fix_raw: Assertion `errno == 0' failed.
Aborted (core dumped)
```

After:
```
$ perf stat -e 'cpu/rE7574c47490475745/' true
event syntax error: 'cpu/rE7574c47490475745/'
                         \___ Bad event or PMU

Unable to find PMU or event on a PMU of 'cpu'

Initial error:
event syntax error: 'cpu/rE7574c47490475745/'
                         \___ unknown term 'rE7574c47490475745' for pmu 'cpu'

valid terms: event,pc,edge,offcore_rsp,ldlat,inv,umask,frontend,cmask,config,config1,config2,config3,name,period,percore,metric-id
Run 'perf list' for a list of valid events

 Usage: perf stat [<options>] [<command>]

    -e, --event <event>   event selector. use 'perf list' to list available events
```

Issue found through fuzz testing.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20230907210533.3712979-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoMerge tag 'perf-tools-fixes-for-v6.6-1-2023-09-25' into perf-tools-next
Arnaldo Carvalho de Melo [Tue, 10 Oct 2023 20:36:36 +0000 (17:36 -0300)]
Merge tag 'perf-tools-fixes-for-v6.6-1-2023-09-25' into perf-tools-next

To pick up the 'perf bench sched-seccomp-notify' changes to allow us to
continue build testing perf-tools-next with the set of distro
containers, where some older ones don't have a recent enough seccomp.h
UAPI header that contains defines needed by this new 'perf bench'
workload.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 months agotools/perf: Update call stack check in builtin-lock.c
Kajol Jain [Tue, 3 Oct 2023 09:21:13 +0000 (14:51 +0530)]
tools/perf: Update call stack check in builtin-lock.c

The perf test named "kernel lock contention analysis test"
fails in powerpc system with below error:

  [command]# ./perf test 81 -vv
   81: kernel lock contention analysis test                            :
   --- start ---
  test child forked, pid 2140
  Testing perf lock record and perf lock contention
  Testing perf lock contention --use-bpf
  [Skip] No BPF support
  Testing perf lock record and perf lock contention at the same time
  Testing perf lock contention --threads
  Testing perf lock contention --lock-addr
  Testing perf lock contention --type-filter (w/ spinlock)
  Testing perf lock contention --lock-filter (w/ tasklist_lock)
  Testing perf lock contention --callstack-filter (w/ unix_stream)
  [Fail] Recorded result should have a lock from unix_stream:
  test child finished with -1
   ---- end ----
  kernel lock contention analysis test: FAILED!

The test is failing because we get an address entry with 0 in
perf lock samples for powerpc, and code for lock contention
option "--callstack-filter" will not check further entries after
address 0.

Below are some of the samples from test generated perf.data file, which
have 0 address in the 2nd entry of callstack:
 --------
sched-messaging    3409 [001]  7152.904029: lock:contention_begin: 0xc00000c80904ef00 (flags=SPIN)
        c0000000001e926c __traceiter_contention_begin+0x6c ([kernel.kallsyms])
                       0 [unknown] ([unknown])
        c000000000f8a178 native_queued_spin_lock_slowpath+0x1f8 ([kernel.kallsyms])
        c000000000f89f44 _raw_spin_lock_irqsave+0x84 ([kernel.kallsyms])
        c0000000001d9fd0 prepare_to_wait+0x50 ([kernel.kallsyms])
        c000000000c80f50 sock_alloc_send_pskb+0x1b0 ([kernel.kallsyms])
        c000000000e82298 unix_stream_sendmsg+0x2b8 ([kernel.kallsyms])
        c000000000c78980 sock_sendmsg+0x80 ([kernel.kallsyms])

sched-messaging    3408 [005]  7152.904036: lock:contention_begin: 0xc00000c80904ef00 (flags=SPIN)
        c0000000001e926c __traceiter_contention_begin+0x6c ([kernel.kallsyms])
                       0 [unknown] ([unknown])
        c000000000f8a178 native_queued_spin_lock_slowpath+0x1f8 ([kernel.kallsyms])
        c000000000f89f44 _raw_spin_lock_irqsave+0x84 ([kernel.kallsyms])
        c0000000001d9fd0 prepare_to_wait+0x50 ([kernel.kallsyms])
        c000000000c80f50 sock_alloc_send_pskb+0x1b0 ([kernel.kallsyms])
        c000000000e82298 unix_stream_sendmsg+0x2b8 ([kernel.kallsyms])
        c000000000c78980 sock_sendmsg+0x80 ([kernel.kallsyms])
 --------

Based on commit 20002ded4d93 ("perf_counter: powerpc: Add callchain support"),
incase of powerpc, the callchain saved by kernel always includes first
three entries as the NIP (next instruction pointer), LR (link register), and
the contents of LR save area in the second stack frame. In certain scenarios
its possible to have invalid kernel instruction addresses in either of LR or the
second stack frame's LR. In that case, kernel will store the address as zer0.
Hence, its possible to have 2nd or 3rd callstack entry as 0.

As per the current code in match_callstack_filter function, we skip the callstack
check incase we get 0 address. And hence the test case is failing in powerpc.

Fix this issue by updating the check in match_callstack_filter function,
to not skip callstack check if the 2nd or 3rd entry have 0 address
for powerpc.

Result in powerpc after patch changes:

  [command]# ./perf test 81 -vv
   81: kernel lock contention analysis test                            :
   --- start ---
  test child forked, pid 4570
  Testing perf lock record and perf lock contention
  Testing perf lock contention --use-bpf
  [Skip] No BPF support
  Testing perf lock record and perf lock contention at the same time
  Testing perf lock contention --threads
  Testing perf lock contention --lock-addr
  Testing perf lock contention --type-filter (w/ spinlock)
  Testing perf lock contention --lock-filter (w/ tasklist_lock)
  [Skip] Could not find 'tasklist_lock'
  Testing perf lock contention --callstack-filter (w/ unix_stream)
  Testing perf lock contention --callstack-filter with task aggregation
  Testing perf lock contention CSV output
  [Skip] No BPF support
  test child finished with 0
   ---- end ----
  kernel lock contention analysis test: Ok

Fixes: ebab291641be ("perf lock contention: Support filters for different aggregation")
Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Tested-by: Disha Goel <disgoel@linux.ibm.com>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Cc: maddy@linux.ibm.com
Cc: atrajeev@linux.vnet.ibm.com
Link: https://lore.kernel.org/r/20231003092113.252380-1-kjain@linux.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agotools/perf/tests: Fix object code reading to skip address that falls out of text...
Athira Rajeev [Thu, 28 Sep 2023 07:52:13 +0000 (13:22 +0530)]
tools/perf/tests: Fix object code reading to skip address that falls out of text section

The testcase "Object code reading" fails in somecases
for "fs_something" sub test as below:

    Reading object code for memory address: 0xc008000007f0142c
    File is: /lib/modules/6.5.0-rc3+/kernel/fs/xfs/xfs.ko
    On file address is: 0x1114cc
    Objdump command is: objdump -z -d --start-address=0x11142c --stop-address=0x1114ac /lib/modules/6.5.0-rc3+/kernel/fs/xfs/xfs.ko
    objdump read too few bytes: 128
    test child finished with -1

This can alo be reproduced when running perf record with
workload that exercises fs_something() code. In the test
setup, this is exercising xfs code since root is xfs.

    # perf record ./a.out
    # perf report -v |grep "xfs.ko"
      0.76% a.out /lib/modules/6.5.0-rc3+/kernel/fs/xfs/xfs.ko  0xc008000007de5efc B [k] xlog_cil_commit
      0.74% a.out  /lib/modules/6.5.0-rc3+/kernel/fs/xfs/xfs.ko  0xc008000007d5ae18 B [k] xfs_btree_key_offset
      0.74% a.out  /lib/modules/6.5.0-rc3+/kernel/fs/xfs/xfs.ko  0xc008000007e11fd4 B [k] 0x0000000000112074

Here addr "0xc008000007e11fd4" is not resolved. since this is a
kernel module, its offset is from the DSO. Xfs module is loaded
at 0xc008000007d00000

   # cat /proc/modules | grep xfs
    xfs 2228224 3 - Live 0xc008000007d00000

And size is 0x220000. So its loaded between  0xc008000007d00000
and 0xc008000007f20000. From objdump, text section is:
    text 0010f7bc  0000000000000000 0000000000000000 000000a0 2**4

Hence perf captured ip maps to 0x112074 which is:
( ip - start of module ) + a0

This offset 0x112074 falls out .text section which is up to 0x10f7bc
In this case for module, the address 0xc008000007e11fd4 is pointing
to stub instructions. This address range represents the module stubs
which is allocated on module load and hence is not part of DSO offset.

To address this issue in "object code reading", skip the sample if
address falls out of text section and is within the module end.
Use the "text_end" member of "struct dso" to do this check.

To address this issue in "perf report", exploring an option of
having stubs range as part of the /proc/kallsyms, so that perf
report can resolve addresses in stubs range

However this patch uses text_end to skip the stub range for
Object code reading testcase.

Reported-by: Disha Goel <disgoel@linux.ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Disha Goel<disgoel@linux.ibm.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Cc: maddy@linux.ibm.com
Cc: disgoel@linux.vnet.ibm.com
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20230928075213.84392-3-atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agotools/perf: Add "is_kmod" to struct dso to check if it is kernel module
Athira Rajeev [Thu, 28 Sep 2023 07:52:12 +0000 (13:22 +0530)]
tools/perf: Add "is_kmod" to struct dso to check if it is kernel module

Update "struct dso" to include new member "is_kmod".
This new field will determine if the file is a kernel
module or not.

To resolve the address from a sample, perf looks at the
DSO maps. In case of address from a kernel module, there
were some address found to be not resolved. This was
observed while running perf test for "Object code reading".
Though the ip falls beteen the start address of the loaded
module (perf map->start ) and end address ( perf map->end),
it was unresolved.

This was happening because in some cases for kernel
modules, address from sample points to stub instructions.
To identify if the DSO is a kernel module, the new field
"is_kmod" is added to "struct dso".

Reported-by: Disha Goel <disgoel@linux.ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: kjain@linux.ibm.com
Cc: maddy@linux.ibm.com
Cc: disgoel@linux.vnet.ibm.com
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20230928075213.84392-2-atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agotools/perf: Add text_end to "struct dso" to save .text section size
Athira Rajeev [Thu, 28 Sep 2023 07:52:11 +0000 (13:22 +0530)]
tools/perf: Add text_end to "struct dso" to save .text section size

Update "struct dso" to include new member "text_end".
This new field will represent the offset for end of text
section for a dso. For elf, this value is derived as:
sh_size (Size of section in byes) + sh_offset (Section file
offst) of the elf header for text.

For bfd, this value is derived as:
1. For PE file,
section->size + ( section->vma - dso->text_offset)
2. Other cases:
section->filepos (file position) + section->size (size of
section)

To resolve the address from a sample, perf looks at the
DSO maps. In case of address from a kernel module, there
were some address found to be not resolved. This was
observed while running perf test for "Object code reading".
Though the ip falls beteen the start address of the loaded
module (perf map->start ) and end address ( perf map->end),
it was unresolved.

Example:

    Reading object code for memory address: 0xc008000007f0142c
    File is: /lib/modules/6.5.0-rc3+/kernel/fs/xfs/xfs.ko
    On file address is: 0x1114cc
    Objdump command is: objdump -z -d --start-address=0x11142c --stop-address=0x1114ac /lib/modules/6.5.0-rc3+/kernel/fs/xfs/xfs.ko
    objdump read too few bytes: 128
    test child finished with -1

Here, module is loaded at:
    # cat /proc/modules | grep xfs
    xfs 2228224 3 - Live 0xc008000007d00000

From objdump for xfs module, text section is:
    text 0010f7bc  0000000000000000 0000000000000000 000000a0 2**4

Here the offset for 0xc008000007f0142c ie  0x112074 falls out
.text section which is up to 0x10f7bc.

In this case for module, the address 0xc008000007e11fd4 is pointing
to stub instructions. This address range represents the module stubs
which is allocated on module load and hence is not part of DSO offset.

To identify such  address, which falls out of text
section and within module end, added the new field "text_end" to
"struct dso".

Reported-by: Disha Goel <disgoel@linux.ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: maddy@linux.ibm.com
Cc: disgoel@linux.vnet.ibm.com
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20230928075213.84392-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf test: Avoid system wide when not privileged
Ian Rogers [Sat, 30 Sep 2023 06:02:06 +0000 (23:02 -0700)]
perf test: Avoid system wide when not privileged

Switch the test program to sleep that makes more sense for system wide
events. Only enable system wide when root or not paranoid. This avoids
failures under some testing conditions like ARM cloud.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20230930060206.2353141-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf hisi-ptt: Fix memory leak in lseek failure handling
Kuan-Wei Chiu [Sat, 30 Sep 2023 07:27:19 +0000 (15:27 +0800)]
perf hisi-ptt: Fix memory leak in lseek failure handling

In the previous code, there was a memory leak issue where the previously
allocated memory was not freed upon a failed lseek operation. This patch
addresses the problem by releasing the old memory before returning -errno
in case of a lseek failure. This ensures that memory is properly managed
and avoids potential memory leaks.

Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: yangyicong@hisilicon.com
Cc: jonathan.cameron@huawei.com
Link: https://lore.kernel.org/r/20230930072719.1267784-1-visitorckw@gmail.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf intel-pt: Fix async branch flags
Adrian Hunter [Thu, 28 Sep 2023 07:29:53 +0000 (10:29 +0300)]
perf intel-pt: Fix async branch flags

Ensure PERF_IP_FLAG_ASYNC is set always for asynchronous branches (i.e.
interrupts etc).

Fixes: 90e457f7be08 ("perf tools: Add Intel PT support")
Cc: stable@vger.kernel.org
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20230928072953.19369-1-adrian.hunter@intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf pmus: Make PMU alias name loading lazy
Ian Rogers [Mon, 25 Sep 2023 06:23:23 +0000 (23:23 -0700)]
perf pmus: Make PMU alias name loading lazy

PMU alias names were computed when the first perf_pmu is created,
scanning all PMUs in event sources for a file called alias that
generally doesn't exist. Switch to trying to load the file when all
PMU related files are loaded in lookup. This would cause a PMU name
lookup of an alias name to fail if no PMUs were loaded, so in that
case all PMUs are loaded and the find repeated. The overhead is
similar but in the (very) general case not all PMUs are scanned for
the alias file.

As the overhead occurs once per invocation it doesn't show in perf
bench internals pmu-scan. On a tigerlake machine, the number of openat
system calls for an event of cpu/cycles/ with perf stat reduces from
94 to 69 (ie 25 fewer openat calls).

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20230925062323.840799-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf test: Fix parse-events tests to skip parametrized events
Athira Rajeev [Wed, 27 Sep 2023 18:17:03 +0000 (23:47 +0530)]
perf test: Fix parse-events tests to skip parametrized events

Testcase "Parsing of all PMU events from sysfs" parse events for
all PMUs, and not just cpu. In case of powerpc, the PowerVM
environment supports events from hv_24x7 and hv_gpci PMU which
is of example format like below:

- hv_24x7/CPM_ADJUNCT_INST,domain=?,core=?/
- hv_gpci/event,partition_id=?/

The value for "?" needs to be filled in depending on system
configuration. It is better to skip these parametrized events
in this test as it is done in:
'commit b50d691e50e6 ("perf test: Fix "all PMU test" to skip
parametrized events")' which handled a simialr instance with
"all PMU test".

Fix parse-events test to skip parametrized events since
it needs proper setup of the parameters.

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Ian Rogers <irogers@google.com>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Cc: maddy@linux.ibm.com
Cc: disgoel@linux.vnet.ibm.com
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20230927181703.80936-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf vendor events: Add JSON metrics for Arm CMN
Jing Zhang [Wed, 27 Sep 2023 05:59:51 +0000 (13:59 +0800)]
perf vendor events: Add JSON metrics for Arm CMN

Add JSON metrics for Arm CMN. Currently just add part of CMN PMU
metrics which are general and compatible for any SoC with CMN-ANY.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Tested-by: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Shuai Xue <xueshuai@linux.alibaba.com>
Cc: Zhuo Song <zhuo.song@linux.alibaba.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Link: https://lore.kernel.org/r/1695794391-34817-8-git-send-email-renyu.zj@linux.alibaba.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf jevents: Add support for Arm CMN PMU aliasing
Jing Zhang [Wed, 27 Sep 2023 05:59:50 +0000 (13:59 +0800)]
perf jevents: Add support for Arm CMN PMU aliasing

Currently just add aliases for part of Arm CMN PMU events which
are general and compatible for any SoC and CMN-ANY.

"Compat" value "(434|436|43c|43a).*" means it is compatible with
all CMN600/CMN650/CMN700/Ci700, which can be obtained from
commit 7819e05a0dce ("perf/arm-cmn: Revamp model detection").

The arm-cmn PMU events got from:
[0] https://developer.arm.com/documentation/100180/0302/?lang=en
[1] https://developer.arm.com/documentation/101408/0100/?lang=en
[2] https://developer.arm.com/documentation/102308/0302/?lang=en
[3] https://developer.arm.com/documentation/101569/0300/?lang=en

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Tested-by: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Shuai Xue <xueshuai@linux.alibaba.com>
Cc: Zhuo Song <zhuo.song@linux.alibaba.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Link: https://lore.kernel.org/r/1695794391-34817-7-git-send-email-renyu.zj@linux.alibaba.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf test: Add pmu-event test for "Compat" and new event_field.
Jing Zhang [Wed, 27 Sep 2023 05:59:49 +0000 (13:59 +0800)]
perf test: Add pmu-event test for "Compat" and new event_field.

Add new event test for uncore system event which is used to verify the
functionality of "Compat" matching multiple identifiers and the new event
fields "EventidCode" and "NodeType".

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Tested-by: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Shuai Xue <xueshuai@linux.alibaba.com>
Cc: Zhuo Song <zhuo.song@linux.alibaba.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Link: https://lore.kernel.org/r/1695794391-34817-6-git-send-email-renyu.zj@linux.alibaba.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf test: Make matching_pmu effective
Jing Zhang [Wed, 27 Sep 2023 05:59:48 +0000 (13:59 +0800)]
perf test: Make matching_pmu effective

The perf_pmu_test_event.matching_pmu didn't work. No matter what its
value is, it does not affect the test results. So let matching_pmu be
used for matching perf_pmu_test_pmu.pmu.name.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Tested-by: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Shuai Xue <xueshuai@linux.alibaba.com>
Cc: Zhuo Song <zhuo.song@linux.alibaba.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Link: https://lore.kernel.org/r/1695794391-34817-5-git-send-email-renyu.zj@linux.alibaba.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf jevents: Support EventidCode and NodeType
Jing Zhang [Wed, 27 Sep 2023 05:59:47 +0000 (13:59 +0800)]
perf jevents: Support EventidCode and NodeType

The previous code assumes an event has either an "event=" or "config"
field at the beginning. For CMN neither of these may be present, as an
event is typically "type=xx,eventid=xxx".

So add EventidCode and NodeType to support CMN event description.

I compared pmu_event.c before and after compiling with JEVENT_ARCH=all,
they are consistent.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Tested-by: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Shuai Xue <xueshuai@linux.alibaba.com>
Cc: Zhuo Song <zhuo.song@linux.alibaba.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Link: https://lore.kernel.org/r/1695794391-34817-4-git-send-email-renyu.zj@linux.alibaba.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf metric: "Compat" supports regular expression matching identifiers
Jing Zhang [Wed, 27 Sep 2023 05:59:46 +0000 (13:59 +0800)]
perf metric: "Compat" supports regular expression matching identifiers

The jevent "Compat" is used for uncore PMU alias or metric definitions.

The same PMU driver has different PMU identifiers due to different
hardware versions and types, but they may have some common PMU metric.
Since a Compat value can only match one identifier, when adding the
same metric to PMUs with different identifiers, each identifier needs
to be defined once, which is not streamlined enough.

So let "Compat" support using regular expression to match multiple
identifiers for uncore PMU metric.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Tested-by: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Shuai Xue <xueshuai@linux.alibaba.com>
Cc: Zhuo Song <zhuo.song@linux.alibaba.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Link: https://lore.kernel.org/r/1695794391-34817-3-git-send-email-renyu.zj@linux.alibaba.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf pmu: "Compat" supports regular expression matching identifiers
Jing Zhang [Wed, 27 Sep 2023 05:59:45 +0000 (13:59 +0800)]
perf pmu: "Compat" supports regular expression matching identifiers

The jevent "Compat" is used for uncore PMU alias or metric definitions.

The same PMU driver has different PMU identifiers due to different
hardware versions and types, but they may have some common PMU event.
Since a Compat value can only match one identifier, when adding the
same event alias to PMUs with different identifiers, each identifier
needs to be defined once, which is not streamlined enough.

So let "Compat" support using regular expression to match identifiers
for uncore PMU alias. For example, if the "Compat" value is set to
"43401|43c01", it would be able to match PMU identifiers such as "43401"
or "43c01", which correspond to CMN600_r0p0 or CMN700_r0p0.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Tested-by: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Shuai Xue <xueshuai@linux.alibaba.com>
Cc: Zhuo Song <zhuo.song@linux.alibaba.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Link: https://lore.kernel.org/r/1695794391-34817-2-git-send-email-renyu.zj@linux.alibaba.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
11 months agoperf record: Fix BTF type checks in the off-cpu profiling
Namhyung Kim [Fri, 22 Sep 2023 23:44:44 +0000 (16:44 -0700)]
perf record: Fix BTF type checks in the off-cpu profiling

The BTF func proto for a tracepoint has one more argument than the
actual tracepoint function since it has a context argument at the
begining.  So it should compare to 5 when the tracepoint has 4
arguments.

  typedef void (*btf_trace_sched_switch)(void *, bool, struct task_struct *, struct task_struct *, unsigned int);

Also, recent change in the perf tool would use a hand-written minimal
vmlinux.h to generate BTF in the skeleton.  So it won't have the info
of the tracepoint.  Anyway it should use the kernel's vmlinux BTF to
check the type in the kernel.

Fixes: b36888f71c85 ("perf record: Handle argument change in sched_switch")
Reviewed-by: Ian Rogers <irogers@google.com>
Acked-by: Song Liu <song@kernel.org>
Cc: Hao Luo <haoluo@google.com>
CC: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230922234444.3115821-1-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>