Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump urllib3 from 2.0.4 to 2.0.7 in /drivers/gpu/drm/ci/xfails #3

Conversation

dependabot[bot]
Copy link

@dependabot dependabot bot commented on behalf of github Nov 2, 2023

Bumps urllib3 from 2.0.4 to 2.0.7.

Release notes

Sourced from urllib3's releases.

2.0.7

  • Made body stripped from HTTP requests changing the request method to GET after HTTP 303 "See Other" redirect responses. (GHSA-g4mx-q9vg-27p4)

2.0.6

  • Added the Cookie header to the list of headers to strip from requests when redirecting to a different host. As before, different headers can be set via Retry.remove_headers_on_redirect. (GHSA-v845-jxx5-vc9f)

2.0.5

  • Allowed pyOpenSSL third-party module without any deprecation warning. #3126
  • Fixed default blocksize of HTTPConnection classes to match high-level classes. Previously was 8KiB, now 16KiB. #3066
Changelog

Sourced from urllib3's changelog.

2.0.7 (2023-10-17)

  • Made body stripped from HTTP requests changing the request method to GET after HTTP 303 "See Other" redirect responses.

2.0.6 (2023-10-02)

  • Added the Cookie header to the list of headers to strip from requests when redirecting to a different host. As before, different headers can be set via Retry.remove_headers_on_redirect.

2.0.5 (2023-09-20)

  • Allowed pyOpenSSL third-party module without any deprecation warning. ([#3126](https:/urllib3/urllib3/issues/3126) <https:/urllib3/urllib3/issues/3126>__)
  • Fixed default blocksize of HTTPConnection classes to match high-level classes. Previously was 8KiB, now 16KiB. ([#3066](https:/urllib3/urllib3/issues/3066) <https:/urllib3/urllib3/issues/3066>__)
Commits

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    You can disable automated security fix PRs for this repo from the Security Alerts page.

Bumps [urllib3](https:/urllib3/urllib3) from 2.0.4 to 2.0.7.
- [Release notes](https:/urllib3/urllib3/releases)
- [Changelog](https:/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](urllib3/urllib3@2.0.4...2.0.7)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
@dependabot dependabot bot added the dependencies Pull requests that update a dependency file label Nov 2, 2023
ColinIanKing pushed a commit that referenced this pull request Nov 7, 2023
Chuyi Zhou says:

====================
Relax allowlist for open-coded css_task iter

Hi,
The patchset aims to relax the allowlist for open-coded css_task iter
suggested by Alexei[1].

Please see individual patches for more details. And comments are always
welcome.

Patch summary:
 * Patch #1: Relax the allowlist and let css_task iter can be used in
   bpf iters and any sleepable progs.
 * Patch #2: Add a test in cgroup_iters.c which demonstrates how
   css_task iters can be combined with cgroup iter.
 * Patch #3: Add a test to prove css_task iter can be used in normal
 * sleepable progs.
link[1]:https://lore.kernel.org/lkml/CAADnVQKafk_junRyE=-FVAik4hjTRDtThymYGEL8hGTuYoOGpA@mail.gmail.com/
---

Changes in v2:
 * Fix the incorrect logic in check_css_task_iter_allowlist. Use
   expected_attach_type to check whether we are using bpf_iters.
 * Link to v1:https://lore.kernel.org/bpf/[email protected]/T/#m946f9cde86b44a13265d9a44c5738a711eb578fd
Changes in v3:
 * Add a testcase to prove css_task can be used in fentry.s
 * Link to v2:https://lore.kernel.org/bpf/[email protected]/T/#m14a97041ff56c2df21bc0149449abd275b73f6a3
Changes in v4:
 * Add Yonghong's ack for patch #1 and patch #2.
 * Solve Yonghong's comments for patch #2
 * Move prog 'iter_css_task_for_each_sleep' from iters_task_failure.c to
   iters_css_task.c. Use RUN_TESTS to prove we can load this prog.
 * Link to v3:https://lore.kernel.org/bpf/[email protected]/T/#m3200d8ad29af4ffab97588e297361d0a45d7585d

---
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Nov 7, 2023
When LAN9303 is MDIO-connected two callchains exist into
mdio->bus->write():

1. switch ports 1&2 ("physical" PHYs):

virtual (switch-internal) MDIO bus (lan9303_switch_ops->phy_{read|write})->
  lan9303_mdio_phy_{read|write} -> mdiobus_{read|write}_nested

2. LAN9303 virtual PHY:

virtual MDIO bus (lan9303_phy_{read|write}) ->
  lan9303_virt_phy_reg_{read|write} -> regmap -> lan9303_mdio_{read|write}

If the latter functions just take
mutex_lock(&sw_dev->device->bus->mdio_lock) it triggers a LOCKDEP
false-positive splat. It's false-positive because the first
mdio_lock in the second callchain above belongs to virtual MDIO bus, the
second mdio_lock belongs to physical MDIO bus.

Consequent annotation in lan9303_mdio_{read|write} as nested lock
(similar to lan9303_mdio_phy_{read|write}, it's the same physical MDIO bus)
prevents the following splat:

WARNING: possible circular locking dependency detected
5.15.71 #1 Not tainted
------------------------------------------------------
kworker/u4:3/609 is trying to acquire lock:
ffff000011531c68 (lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock){+.+.}-{3:3}, at: regmap_lock_mutex
but task is already holding lock:
ffff0000114c44d8 (&bus->mdio_lock){+.+.}-{3:3}, at: mdiobus_read
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&bus->mdio_lock){+.+.}-{3:3}:
       lock_acquire
       __mutex_lock
       mutex_lock_nested
       lan9303_mdio_read
       _regmap_read
       regmap_read
       lan9303_probe
       lan9303_mdio_probe
       mdio_probe
       really_probe
       __driver_probe_device
       driver_probe_device
       __device_attach_driver
       bus_for_each_drv
       __device_attach
       device_initial_probe
       bus_probe_device
       deferred_probe_work_func
       process_one_work
       worker_thread
       kthread
       ret_from_fork
-> #0 (lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock){+.+.}-{3:3}:
       __lock_acquire
       lock_acquire.part.0
       lock_acquire
       __mutex_lock
       mutex_lock_nested
       regmap_lock_mutex
       regmap_read
       lan9303_phy_read
       dsa_slave_phy_read
       __mdiobus_read
       mdiobus_read
       get_phy_device
       mdiobus_scan
       __mdiobus_register
       dsa_register_switch
       lan9303_probe
       lan9303_mdio_probe
       mdio_probe
       really_probe
       __driver_probe_device
       driver_probe_device
       __device_attach_driver
       bus_for_each_drv
       __device_attach
       device_initial_probe
       bus_probe_device
       deferred_probe_work_func
       process_one_work
       worker_thread
       kthread
       ret_from_fork
other info that might help us debug this:
 Possible unsafe locking scenario:
       CPU0                    CPU1
       ----                    ----
  lock(&bus->mdio_lock);
                               lock(lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock);
                               lock(&bus->mdio_lock);
  lock(lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock);
*** DEADLOCK ***
5 locks held by kworker/u4:3/609:
 #0: ffff000002842938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work
 #1: ffff80000bacbd60 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work
 #2: ffff000007645178 (&dev->mutex){....}-{3:3}, at: __device_attach
 #3: ffff8000096e6e78 (dsa2_mutex){+.+.}-{3:3}, at: dsa_register_switch
 #4: ffff0000114c44d8 (&bus->mdio_lock){+.+.}-{3:3}, at: mdiobus_read
stack backtrace:
CPU: 1 PID: 609 Comm: kworker/u4:3 Not tainted 5.15.71 #1
Workqueue: events_unbound deferred_probe_work_func
Call trace:
 dump_backtrace
 show_stack
 dump_stack_lvl
 dump_stack
 print_circular_bug
 check_noncircular
 __lock_acquire
 lock_acquire.part.0
 lock_acquire
 __mutex_lock
 mutex_lock_nested
 regmap_lock_mutex
 regmap_read
 lan9303_phy_read
 dsa_slave_phy_read
 __mdiobus_read
 mdiobus_read
 get_phy_device
 mdiobus_scan
 __mdiobus_register
 dsa_register_switch
 lan9303_probe
 lan9303_mdio_probe
...

Cc: [email protected]
Fixes: dc70058 ("net: dsa: LAN9303: add MDIO managed mode support")
Signed-off-by: Alexander Sverdlin <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
@ColinIanKing ColinIanKing deleted the dependabot/pip/drivers/gpu/drm/ci/xfails/urllib3-2.0.7 branch November 7, 2023 09:52
Copy link
Author

dependabot bot commented on behalf of github Nov 7, 2023

OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version, let me know by commenting @dependabot ignore this major version or @dependabot ignore this minor version.

If you change your mind, just re-open this PR and I'll resolve any conflicts on it.

ColinIanKing pushed a commit that referenced this pull request Nov 16, 2023
KMSAN reported the following uninit-value access issue:

=====================================================
BUG: KMSAN: uninit-value in virtio_transport_recv_pkt+0x1dfb/0x26a0 net/vmw_vsock/virtio_transport_common.c:1421
 virtio_transport_recv_pkt+0x1dfb/0x26a0 net/vmw_vsock/virtio_transport_common.c:1421
 vsock_loopback_work+0x3bb/0x5a0 net/vmw_vsock/vsock_loopback.c:120
 process_one_work kernel/workqueue.c:2630 [inline]
 process_scheduled_works+0xff6/0x1e60 kernel/workqueue.c:2703
 worker_thread+0xeca/0x14d0 kernel/workqueue.c:2784
 kthread+0x3cc/0x520 kernel/kthread.c:388
 ret_from_fork+0x66/0x80 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304

Uninit was stored to memory at:
 virtio_transport_space_update net/vmw_vsock/virtio_transport_common.c:1274 [inline]
 virtio_transport_recv_pkt+0x1ee8/0x26a0 net/vmw_vsock/virtio_transport_common.c:1415
 vsock_loopback_work+0x3bb/0x5a0 net/vmw_vsock/vsock_loopback.c:120
 process_one_work kernel/workqueue.c:2630 [inline]
 process_scheduled_works+0xff6/0x1e60 kernel/workqueue.c:2703
 worker_thread+0xeca/0x14d0 kernel/workqueue.c:2784
 kthread+0x3cc/0x520 kernel/kthread.c:388
 ret_from_fork+0x66/0x80 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304

Uninit was created at:
 slab_post_alloc_hook+0x105/0xad0 mm/slab.h:767
 slab_alloc_node mm/slub.c:3478 [inline]
 kmem_cache_alloc_node+0x5a2/0xaf0 mm/slub.c:3523
 kmalloc_reserve+0x13c/0x4a0 net/core/skbuff.c:559
 __alloc_skb+0x2fd/0x770 net/core/skbuff.c:650
 alloc_skb include/linux/skbuff.h:1286 [inline]
 virtio_vsock_alloc_skb include/linux/virtio_vsock.h:66 [inline]
 virtio_transport_alloc_skb+0x90/0x11e0 net/vmw_vsock/virtio_transport_common.c:58
 virtio_transport_reset_no_sock net/vmw_vsock/virtio_transport_common.c:957 [inline]
 virtio_transport_recv_pkt+0x1279/0x26a0 net/vmw_vsock/virtio_transport_common.c:1387
 vsock_loopback_work+0x3bb/0x5a0 net/vmw_vsock/vsock_loopback.c:120
 process_one_work kernel/workqueue.c:2630 [inline]
 process_scheduled_works+0xff6/0x1e60 kernel/workqueue.c:2703
 worker_thread+0xeca/0x14d0 kernel/workqueue.c:2784
 kthread+0x3cc/0x520 kernel/kthread.c:388
 ret_from_fork+0x66/0x80 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304

CPU: 1 PID: 10664 Comm: kworker/1:5 Not tainted 6.6.0-rc3-00146-g9f3ebbef746f #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc38 04/01/2014
Workqueue: vsock-loopback vsock_loopback_work
=====================================================

The following simple reproducer can cause the issue described above:

int main(void)
{
  int sock;
  struct sockaddr_vm addr = {
    .svm_family = AF_VSOCK,
    .svm_cid = VMADDR_CID_ANY,
    .svm_port = 1234,
  };

  sock = socket(AF_VSOCK, SOCK_STREAM, 0);
  connect(sock, (struct sockaddr *)&addr, sizeof(addr));
  return 0;
}

This issue occurs because the `buf_alloc` and `fwd_cnt` fields of the
`struct virtio_vsock_hdr` are not initialized when a new skb is allocated
in `virtio_transport_init_hdr()`. This patch resolves the issue by
initializing these fields during allocation.

Fixes: 71dc9ec ("virtio/vsock: replace virtio_vsock_pkt with sk_buff")
Reported-and-tested-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=0c8ce1da0ac31abbadcd
Signed-off-by: Shigeru Yoshida <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Nov 16, 2023
Andrii Nakryiko says:

====================
BPF register bounds logic and testing improvements

This patch set adds a big set of manual and auto-generated test cases
validating BPF verifier's register bounds tracking and deduction logic. See
details in the last patch.

We start with building a tester that validates existing <range> vs <scalar>
verifier logic for range bounds. To make all this work, BPF verifier's logic
needed a bunch of improvements to handle some cases that previously were not
covered. This had no implications as to correctness of verifier logic, but it
was incomplete enough to cause significant disagreements with alternative
implementation of register bounds logic that tests in this patch set
implement. So we need BPF verifier logic improvements to make all the tests
pass. This is what we do in patches #3 through #9.

The end goal of this work, though, is to extend BPF verifier range state
tracking such as to allow to derive new range bounds when comparing non-const
registers. There is some more investigative work required to investigate and
fix existing potential issues with range tracking as part of ALU/ALU64
operations, so <range> x <range> part of v5 patch set ([0]) is dropped until
these issues are sorted out.

For now, we include preparatory refactorings and clean ups, that set up BPF
verifier code base to extend the logic to <range> vs <range> logic in
subsequent patch set. Patches #10-#16 perform preliminary refactorings without
functionally changing anything. But they do clean up check_cond_jmp_op() logic
and generalize a bunch of other pieces in is_branch_taken() logic.

  [0] https://patchwork.kernel.org/project/netdevbpf/list/?series=797178&state=*

v5->v6:
  - dropped <range> vs <range> patches (original patches #18 through #23) to
    add more register range sanity checks and fix preexisting issues;
  - comments improvements, addressing other feedback on first 17 patches
    (Eduard, Alexei);
v4->v5:
  - added entirety of verifier reg bounds tracking changes, now handling
    <range> vs <range> cases (Alexei);
  - added way more comments trying to explain why deductions added are
    correct, hopefully they are useful and clarify things a bit (Daniel,
    Shung-Hsi);
  - added two preliminary selftests fixes necessary for RELEASE=1 build to
    work again, it keeps breaking.
v3->v4:
  - improvements to reg_bounds tester (progress report, split 32-bit and
    64-bit ranges, fix various verbosity output issues, etc);
v2->v3:
  - fix a subtle little-endianness assumption inside parge_reg_state() (CI);
v1->v2:
  - fix compilation when building selftests with llvm-16 toolchain (CI).
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Nov 16, 2023
Hou Tao says:

====================
From: Hou Tao <[email protected]>

Hi,

BPF CI failed due to map_percpu_stats_percpu_hash from time to time [1].
It seems that the failure reason is per-cpu bpf memory allocator may not
be able to allocate per-cpu pointer successfully and it can not refill
free llist timely, and bpf_map_update_elem() will return -ENOMEM.

Patch #1 fixes the size of value passed to per-cpu map update API. The
problem was found when fixing the ENOMEM problem, so also post it in
this patchset. Patch #2 & #3 mitigates the ENOMEM problem by retrying
the update operation for non-preallocated per-cpu map.

Please see individual patches for more details. And comments are always
welcome.

Regards,
Tao

[1]: https:/kernel-patches/bpf/actions/runs/6713177520/job/18244865326?pr=5909
====================

Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Nov 16, 2023
Andrii Nakryiko says:

====================
BPF control flow graph and precision backtrack fixes

A small fix to BPF verifier's CFG logic around handling and reporting ldimm64
instructions. Patch #1 was previously submitted separately ([0]), and so this
patch set supersedes that patch.

Second patch is fixing obscure corner case in mark_chain_precise() logic. See
patch for details. Patch #3 adds a dedicated test, however fragile it might.

  [0] https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Nov 16, 2023
…mory

Introduce an ioctl(), KVM_CREATE_GUEST_MEMFD, to allow creating file-based
memory that is tied to a specific KVM virtual machine and whose primary
purpose is to serve guest memory.

A guest-first memory subsystem allows for optimizations and enhancements
that are kludgy or outright infeasible to implement/support in a generic
memory subsystem.  With guest_memfd, guest protections and mapping sizes
are fully decoupled from host userspace mappings.   E.g. KVM currently
doesn't support mapping memory as writable in the guest without it also
being writable in host userspace, as KVM's ABI uses VMA protections to
define the allow guest protection.  Userspace can fudge this by
establishing two mappings, a writable mapping for the guest and readable
one for itself, but that’s suboptimal on multiple fronts.

Similarly, KVM currently requires the guest mapping size to be a strict
subset of the host userspace mapping size, e.g. KVM doesn’t support
creating a 1GiB guest mapping unless userspace also has a 1GiB guest
mapping.  Decoupling the mappings sizes would allow userspace to precisely
map only what is needed without impacting guest performance, e.g. to
harden against unintentional accesses to guest memory.

Decoupling guest and userspace mappings may also allow for a cleaner
alternative to high-granularity mappings for HugeTLB, which has reached a
bit of an impasse and is unlikely to ever be merged.

A guest-first memory subsystem also provides clearer line of sight to
things like a dedicated memory pool (for slice-of-hardware VMs) and
elimination of "struct page" (for offload setups where userspace _never_
needs to mmap() guest memory).

More immediately, being able to map memory into KVM guests without mapping
said memory into the host is critical for Confidential VMs (CoCo VMs), the
initial use case for guest_memfd.  While AMD's SEV and Intel's TDX prevent
untrusted software from reading guest private data by encrypting guest
memory with a key that isn't usable by the untrusted host, projects such
as Protected KVM (pKVM) provide confidentiality and integrity *without*
relying on memory encryption.  And with SEV-SNP and TDX, accessing guest
private memory can be fatal to the host, i.e. KVM must be prevent host
userspace from accessing guest memory irrespective of hardware behavior.

Attempt #1 to support CoCo VMs was to add a VMA flag to mark memory as
being mappable only by KVM (or a similarly enlightened kernel subsystem).
That approach was abandoned largely due to it needing to play games with
PROT_NONE to prevent userspace from accessing guest memory.

Attempt #2 to was to usurp PG_hwpoison to prevent the host from mapping
guest private memory into userspace, but that approach failed to meet
several requirements for software-based CoCo VMs, e.g. pKVM, as the kernel
wouldn't easily be able to enforce a 1:1 page:guest association, let alone
a 1:1 pfn:gfn mapping.  And using PG_hwpoison does not work for memory
that isn't backed by 'struct page', e.g. if devices gain support for
exposing encrypted memory regions to guests.

Attempt #3 was to extend the memfd() syscall and wrap shmem to provide
dedicated file-based guest memory.  That approach made it as far as v10
before feedback from Hugh Dickins and Christian Brauner (and others) led
to it demise.

Hugh's objection was that piggybacking shmem made no sense for KVM's use
case as KVM didn't actually *want* the features provided by shmem.  I.e.
KVM was using memfd() and shmem to avoid having to manage memory directly,
not because memfd() and shmem were the optimal solution, e.g. things like
read/write/mmap in shmem were dead weight.

Christian pointed out flaws with implementing a partial overlay (wrapping
only _some_ of shmem), e.g. poking at inode_operations or super_operations
would show shmem stuff, but address_space_operations and file_operations
would show KVM's overlay.  Paraphrashing heavily, Christian suggested KVM
stop being lazy and create a proper API.

Link: https://lore.kernel.org/all/[email protected]
Link: https://lore.kernel.org/all/[email protected]
Link: https://lore.kernel.org/all/[email protected]
Link: https://lore.kernel.org/all/[email protected]
Link: https://lore.kernel.org/all/[email protected]
Link: https://lore.kernel.org/all/[email protected]
Link: https://lore.kernel.org/all/20230418-anfallen-irdisch-6993a61be10b@brauner
Link: https://lore.kernel.org/all/[email protected]
Link: https://lore.kernel.org/linux-mm/20230306191944.GA15773@monkey
Link: https://lore.kernel.org/linux-mm/[email protected]
Cc: Fuad Tabba <[email protected]>
Cc: Vishal Annapurve <[email protected]>
Cc: Ackerley Tng <[email protected]>
Cc: Jarkko Sakkinen <[email protected]>
Cc: Maciej Szmigiero <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Quentin Perret <[email protected]>
Cc: Michael Roth <[email protected]>
Cc: Wang <[email protected]>
Cc: Liam Merwick <[email protected]>
Cc: Isaku Yamahata <[email protected]>
Co-developed-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
Co-developed-by: Yu Zhang <[email protected]>
Signed-off-by: Yu Zhang <[email protected]>
Co-developed-by: Chao Peng <[email protected]>
Signed-off-by: Chao Peng <[email protected]>
Co-developed-by: Ackerley Tng <[email protected]>
Signed-off-by: Ackerley Tng <[email protected]>
Co-developed-by: Isaku Yamahata <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
Co-developed-by: Paolo Bonzini <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Co-developed-by: Michael Roth <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <[email protected]>
Reviewed-by: Fuad Tabba <[email protected]>
Tested-by: Fuad Tabba <[email protected]>
Reviewed-by: Xiaoyao Li <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Nov 16, 2023
This allows it to break the following circular locking dependency.

Aug 10 07:01:29 dg1test kernel: ======================================================
Aug 10 07:01:29 dg1test kernel: WARNING: possible circular locking dependency detected
Aug 10 07:01:29 dg1test kernel: 6.4.0-rc7+ #10 Not tainted
Aug 10 07:01:29 dg1test kernel: ------------------------------------------------------
Aug 10 07:01:29 dg1test kernel: wireplumber/2236 is trying to acquire lock:
Aug 10 07:01:29 dg1test kernel: ffff8fca5320da18 (&fctx->lock){-...}-{2:2}, at: nouveau_fence_wait_uevent_handler+0x2b/0x100 [nouveau]
Aug 10 07:01:29 dg1test kernel:
                                but task is already holding lock:
Aug 10 07:01:29 dg1test kernel: ffff8fca41208610 (&event->list_lock#2){-...}-{2:2}, at: nvkm_event_ntfy+0x50/0xf0 [nouveau]
Aug 10 07:01:29 dg1test kernel:
                                which lock already depends on the new lock.
Aug 10 07:01:29 dg1test kernel:
                                the existing dependency chain (in reverse order) is:
Aug 10 07:01:29 dg1test kernel:
                                -> #3 (&event->list_lock#2){-...}-{2:2}:
Aug 10 07:01:29 dg1test kernel:        _raw_spin_lock_irqsave+0x4b/0x70
Aug 10 07:01:29 dg1test kernel:        nvkm_event_ntfy+0x50/0xf0 [nouveau]
Aug 10 07:01:29 dg1test kernel:        ga100_fifo_nonstall_intr+0x24/0x30 [nouveau]
Aug 10 07:01:29 dg1test kernel:        nvkm_intr+0x12c/0x240 [nouveau]
Aug 10 07:01:29 dg1test kernel:        __handle_irq_event_percpu+0x88/0x240
Aug 10 07:01:29 dg1test kernel:        handle_irq_event+0x38/0x80
Aug 10 07:01:29 dg1test kernel:        handle_edge_irq+0xa3/0x240
Aug 10 07:01:29 dg1test kernel:        __common_interrupt+0x72/0x160
Aug 10 07:01:29 dg1test kernel:        common_interrupt+0x60/0xe0
Aug 10 07:01:29 dg1test kernel:        asm_common_interrupt+0x26/0x40
Aug 10 07:01:29 dg1test kernel:
                                -> #2 (&device->intr.lock){-...}-{2:2}:
Aug 10 07:01:29 dg1test kernel:        _raw_spin_lock_irqsave+0x4b/0x70
Aug 10 07:01:29 dg1test kernel:        nvkm_inth_allow+0x2c/0x80 [nouveau]
Aug 10 07:01:29 dg1test kernel:        nvkm_event_ntfy_state+0x181/0x250 [nouveau]
Aug 10 07:01:29 dg1test kernel:        nvkm_event_ntfy_allow+0x63/0xd0 [nouveau]
Aug 10 07:01:29 dg1test kernel:        nvkm_uevent_mthd+0x4d/0x70 [nouveau]
Aug 10 07:01:29 dg1test kernel:        nvkm_ioctl+0x10b/0x250 [nouveau]
Aug 10 07:01:29 dg1test kernel:        nvif_object_mthd+0xa8/0x1f0 [nouveau]
Aug 10 07:01:29 dg1test kernel:        nvif_event_allow+0x2a/0xa0 [nouveau]
Aug 10 07:01:29 dg1test kernel:        nouveau_fence_enable_signaling+0x78/0x80 [nouveau]
Aug 10 07:01:29 dg1test kernel:        __dma_fence_enable_signaling+0x5e/0x100
Aug 10 07:01:29 dg1test kernel:        dma_fence_add_callback+0x4b/0xd0
Aug 10 07:01:29 dg1test kernel:        nouveau_cli_work_queue+0xae/0x110 [nouveau]
Aug 10 07:01:29 dg1test kernel:        nouveau_gem_object_close+0x1d1/0x2a0 [nouveau]
Aug 10 07:01:29 dg1test kernel:        drm_gem_handle_delete+0x70/0xe0 [drm]
Aug 10 07:01:29 dg1test kernel:        drm_ioctl_kernel+0xa5/0x150 [drm]
Aug 10 07:01:29 dg1test kernel:        drm_ioctl+0x256/0x490 [drm]
Aug 10 07:01:29 dg1test kernel:        nouveau_drm_ioctl+0x5a/0xb0 [nouveau]
Aug 10 07:01:29 dg1test kernel:        __x64_sys_ioctl+0x91/0xd0
Aug 10 07:01:29 dg1test kernel:        do_syscall_64+0x3c/0x90
Aug 10 07:01:29 dg1test kernel:        entry_SYSCALL_64_after_hwframe+0x72/0xdc
Aug 10 07:01:29 dg1test kernel:
                                -> #1 (&event->refs_lock#4){....}-{2:2}:
Aug 10 07:01:29 dg1test kernel:        _raw_spin_lock_irqsave+0x4b/0x70
Aug 10 07:01:29 dg1test kernel:        nvkm_event_ntfy_state+0x37/0x250 [nouveau]
Aug 10 07:01:29 dg1test kernel:        nvkm_event_ntfy_allow+0x63/0xd0 [nouveau]
Aug 10 07:01:29 dg1test kernel:        nvkm_uevent_mthd+0x4d/0x70 [nouveau]
Aug 10 07:01:29 dg1test kernel:        nvkm_ioctl+0x10b/0x250 [nouveau]
Aug 10 07:01:29 dg1test kernel:        nvif_object_mthd+0xa8/0x1f0 [nouveau]
Aug 10 07:01:29 dg1test kernel:        nvif_event_allow+0x2a/0xa0 [nouveau]
Aug 10 07:01:29 dg1test kernel:        nouveau_fence_enable_signaling+0x78/0x80 [nouveau]
Aug 10 07:01:29 dg1test kernel:        __dma_fence_enable_signaling+0x5e/0x100
Aug 10 07:01:29 dg1test kernel:        dma_fence_add_callback+0x4b/0xd0
Aug 10 07:01:29 dg1test kernel:        nouveau_cli_work_queue+0xae/0x110 [nouveau]
Aug 10 07:01:29 dg1test kernel:        nouveau_gem_object_close+0x1d1/0x2a0 [nouveau]
Aug 10 07:01:29 dg1test kernel:        drm_gem_handle_delete+0x70/0xe0 [drm]
Aug 10 07:01:29 dg1test kernel:        drm_ioctl_kernel+0xa5/0x150 [drm]
Aug 10 07:01:29 dg1test kernel:        drm_ioctl+0x256/0x490 [drm]
Aug 10 07:01:29 dg1test kernel:        nouveau_drm_ioctl+0x5a/0xb0 [nouveau]
Aug 10 07:01:29 dg1test kernel:        __x64_sys_ioctl+0x91/0xd0
Aug 10 07:01:29 dg1test kernel:        do_syscall_64+0x3c/0x90
Aug 10 07:01:29 dg1test kernel:        entry_SYSCALL_64_after_hwframe+0x72/0xdc
Aug 10 07:01:29 dg1test kernel:
                                -> #0 (&fctx->lock){-...}-{2:2}:
Aug 10 07:01:29 dg1test kernel:        __lock_acquire+0x14e3/0x2240
Aug 10 07:01:29 dg1test kernel:        lock_acquire+0xc8/0x2a0
Aug 10 07:01:29 dg1test kernel:        _raw_spin_lock_irqsave+0x4b/0x70
Aug 10 07:01:29 dg1test kernel:        nouveau_fence_wait_uevent_handler+0x2b/0x100 [nouveau]
Aug 10 07:01:29 dg1test kernel:        nvkm_client_event+0xf/0x20 [nouveau]
Aug 10 07:01:29 dg1test kernel:        nvkm_event_ntfy+0x9b/0xf0 [nouveau]
Aug 10 07:01:29 dg1test kernel:        ga100_fifo_nonstall_intr+0x24/0x30 [nouveau]
Aug 10 07:01:29 dg1test kernel:        nvkm_intr+0x12c/0x240 [nouveau]
Aug 10 07:01:29 dg1test kernel:        __handle_irq_event_percpu+0x88/0x240
Aug 10 07:01:29 dg1test kernel:        handle_irq_event+0x38/0x80
Aug 10 07:01:29 dg1test kernel:        handle_edge_irq+0xa3/0x240
Aug 10 07:01:29 dg1test kernel:        __common_interrupt+0x72/0x160
Aug 10 07:01:29 dg1test kernel:        common_interrupt+0x60/0xe0
Aug 10 07:01:29 dg1test kernel:        asm_common_interrupt+0x26/0x40
Aug 10 07:01:29 dg1test kernel:
                                other info that might help us debug this:
Aug 10 07:01:29 dg1test kernel: Chain exists of:
                                  &fctx->lock --> &device->intr.lock --> &event->list_lock#2
Aug 10 07:01:29 dg1test kernel:  Possible unsafe locking scenario:
Aug 10 07:01:29 dg1test kernel:        CPU0                    CPU1
Aug 10 07:01:29 dg1test kernel:        ----                    ----
Aug 10 07:01:29 dg1test kernel:   lock(&event->list_lock#2);
Aug 10 07:01:29 dg1test kernel:                                lock(&device->intr.lock);
Aug 10 07:01:29 dg1test kernel:                                lock(&event->list_lock#2);
Aug 10 07:01:29 dg1test kernel:   lock(&fctx->lock);
Aug 10 07:01:29 dg1test kernel:
                                 *** DEADLOCK ***
Aug 10 07:01:29 dg1test kernel: 2 locks held by wireplumber/2236:
Aug 10 07:01:29 dg1test kernel:  #0: ffff8fca53177bf8 (&device->intr.lock){-...}-{2:2}, at: nvkm_intr+0x29/0x240 [nouveau]
Aug 10 07:01:29 dg1test kernel:  #1: ffff8fca41208610 (&event->list_lock#2){-...}-{2:2}, at: nvkm_event_ntfy+0x50/0xf0 [nouveau]
Aug 10 07:01:29 dg1test kernel:
                                stack backtrace:
Aug 10 07:01:29 dg1test kernel: CPU: 6 PID: 2236 Comm: wireplumber Not tainted 6.4.0-rc7+ #10
Aug 10 07:01:29 dg1test kernel: Hardware name: Gigabyte Technology Co., Ltd. Z390 I AORUS PRO WIFI/Z390 I AORUS PRO WIFI-CF, BIOS F8 11/05/2021
Aug 10 07:01:29 dg1test kernel: Call Trace:
Aug 10 07:01:29 dg1test kernel:  <TASK>
Aug 10 07:01:29 dg1test kernel:  dump_stack_lvl+0x5b/0x90
Aug 10 07:01:29 dg1test kernel:  check_noncircular+0xe2/0x110
Aug 10 07:01:29 dg1test kernel:  __lock_acquire+0x14e3/0x2240
Aug 10 07:01:29 dg1test kernel:  lock_acquire+0xc8/0x2a0
Aug 10 07:01:29 dg1test kernel:  ? nouveau_fence_wait_uevent_handler+0x2b/0x100 [nouveau]
Aug 10 07:01:29 dg1test kernel:  ? lock_acquire+0xc8/0x2a0
Aug 10 07:01:29 dg1test kernel:  _raw_spin_lock_irqsave+0x4b/0x70
Aug 10 07:01:29 dg1test kernel:  ? nouveau_fence_wait_uevent_handler+0x2b/0x100 [nouveau]
Aug 10 07:01:29 dg1test kernel:  nouveau_fence_wait_uevent_handler+0x2b/0x100 [nouveau]
Aug 10 07:01:29 dg1test kernel:  nvkm_client_event+0xf/0x20 [nouveau]
Aug 10 07:01:29 dg1test kernel:  nvkm_event_ntfy+0x9b/0xf0 [nouveau]
Aug 10 07:01:29 dg1test kernel:  ga100_fifo_nonstall_intr+0x24/0x30 [nouveau]
Aug 10 07:01:29 dg1test kernel:  nvkm_intr+0x12c/0x240 [nouveau]
Aug 10 07:01:29 dg1test kernel:  __handle_irq_event_percpu+0x88/0x240
Aug 10 07:01:29 dg1test kernel:  handle_irq_event+0x38/0x80
Aug 10 07:01:29 dg1test kernel:  handle_edge_irq+0xa3/0x240
Aug 10 07:01:29 dg1test kernel:  __common_interrupt+0x72/0x160
Aug 10 07:01:29 dg1test kernel:  common_interrupt+0x60/0xe0
Aug 10 07:01:29 dg1test kernel:  asm_common_interrupt+0x26/0x40
Aug 10 07:01:29 dg1test kernel: RIP: 0033:0x7fb66174d700
Aug 10 07:01:29 dg1test kernel: Code: c1 e2 05 29 ca 8d 0c 10 0f be 07 84 c0 75 eb 89 c8 c3 0f 1f 84 00 00 00 00 00 f3 0f 1e fa e9 d7 0f fc ff 0f 1f 80 00 00 00 00 <f3> 0f 1e fa e9 c7 0f fc>
Aug 10 07:01:29 dg1test kernel: RSP: 002b:00007ffdd3c48438 EFLAGS: 00000206
Aug 10 07:01:29 dg1test kernel: RAX: 000055bb758763c0 RBX: 000055bb758752c0 RCX: 00000000000028b0
Aug 10 07:01:29 dg1test kernel: RDX: 000055bb758752c0 RSI: 000055bb75887490 RDI: 000055bb75862950
Aug 10 07:01:29 dg1test kernel: RBP: 00007ffdd3c48490 R08: 000055bb75873b10 R09: 0000000000000001
Aug 10 07:01:29 dg1test kernel: R10: 0000000000000004 R11: 000055bb7587f000 R12: 000055bb75887490
Aug 10 07:01:29 dg1test kernel: R13: 000055bb757f6280 R14: 000055bb758875c0 R15: 000055bb757f6280
Aug 10 07:01:29 dg1test kernel:  </TASK>

Signed-off-by: Dave Airlie <[email protected]>
Tested-by: Danilo Krummrich <[email protected]>
Reviewed-by: Danilo Krummrich <[email protected]>
Signed-off-by: Danilo Krummrich <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
ColinIanKing pushed a commit that referenced this pull request Nov 16, 2023
Andrii Nakryiko says:

====================
BPF register bounds range vs range support

This patch set is a continuation of work started in [0]. It adds a big set of
manual, auto-generated, and now also random test cases validating BPF
verifier's register bounds tracking and deduction logic.

First few patches generalize verifier's logic to handle conditional jumps and
corresponding range adjustments in case when two non-const registers are
compared to each other. Patch #1 generalizes reg_set_min_max() portion, while
patch #2 does the same for is_branch_taken() part of the overall solution.

Patch #3 improves equality and inequality for cases when BPF program code
mixes 64-bit and 32-bit uses of the same register. Depending on specific
sequence, it's possible to get to the point where u64/s64 bounds will be very
generic (e.g., after signed 32-bit comparison), while we still keep pretty
tight u32/s32 bounds. If in such state we proceed with 32-bit equality or
inequality comparison, reg_set_min_max() might have to deal with adjusting s32
bounds for two registers that don't overlap, which breaks reg_set_min_max().
This doesn't manifest in <range> vs <const> cases, because if that happens
reg_set_min_max() in effect will force s32 bounds to be a new "impossible"
constant (from original smin32/smax32 bounds point of view). Things get tricky
when we have <range> vs <range> adjustments, so instead of trying to somehow
make sense out of such situations, it's best to detect such impossible
situations and prune the branch that can't be taken in is_branch_taken()
logic.  This equality/inequality was the only such category of situations with
auto-generated tests added later in the patch set.

But when we start mixing arithmetic operations in different numeric domains
and conditionals, things get even hairier. So, patch #4 adds sanity checking
logic after all ALU/ALU64, JMP/JMP32, and LDX operations. By default, instead
of failing verification, we conservatively reset range bounds to unknown
values, reporting violation in verifier log (if verbose logs are requested).
But to aid development, detection, and debugging, we also introduce a new test
flag, BPF_F_TEST_SANITY_STRICT, which triggers verification failure on range
sanity violation.

Patch #11 sets BPF_F_TEST_SANITY_STRICT by default for test_progs and
test_verifier. Patch #12 adds support for controlling this in veristat for
testing with production BPF object files.

Getting back to BPF verifier, patches #5 and #6 complete verifier's range
tracking logic clean up. See respective patches for details.

With kernel-side taken care of, we move to testing. We start with building
a tester that validates existing <range> vs <scalar> verifier logic for range
bounds. Patch #7 implements an initial version of such a tester. We guard
millions of generated tests behind SLOW_TESTS=1 envvar requirement, but also
have a relatively small number of tricky cases that came up during development
and debugging of this work. Those will be executed as part of a normal
test_progs run.

Patch #8 simulates more nuanced JEQ/JNE logic we added to verifier in patch #3.
Patch #9 adds <range> vs <range> "slow tests".

Patch #10 is a completely new one, it adds a bunch of randomly generated cases
to be run normally, without SLOW_TESTS=1 guard. This should help to get
a bunch of cover, and hopefully find some remaining latent problems if
verifier proactively as part of normal BPF CI runs.

Finally, a tiny test which was, amazingly, an initial motivation for this
whole work, is added in lucky patch #13, demonstrating how verifier is now
smart enough to track actual number of elements in the array and won't require
additional checks on loop iteration variable inside the bpf_for() open-coded
iterator loop.

  [0] https://patchwork.kernel.org/project/netdevbpf/list/?series=798308&state=*

v1->v2:
  - use x < y => y > x property to minimize reg_set_min_max (Eduard);
  - fix for JEQ/JNE logic in reg_bounds.c (Eduard);
  - split BPF_JSET and !BPF_JSET cases handling (Shung-Hsi);
  - adjustments to reg_bounds.c to make it easier to follow (Alexei);
  - added acks (Eduard, Shung-Hsi).
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Nov 20, 2023
syzbot found a potential circular dependency leading to a deadlock:
    -> #3 (&hdev->req_lock){+.+.}-{3:3}:
    __mutex_lock_common+0x1b6/0x1bc2 kernel/locking/mutex.c:599
    __mutex_lock kernel/locking/mutex.c:732 [inline]
    mutex_lock_nested+0x17/0x1c kernel/locking/mutex.c:784
    hci_dev_do_close+0x3f/0x9f net/bluetooth/hci_core.c:551
    hci_rfkill_set_block+0x130/0x1ac net/bluetooth/hci_core.c:935
    rfkill_set_block+0x1e6/0x3b8 net/rfkill/core.c:345
    rfkill_fop_write+0x2d8/0x672 net/rfkill/core.c:1274
    vfs_write+0x277/0xcf5 fs/read_write.c:594
    ksys_write+0x19b/0x2bd fs/read_write.c:650
    do_syscall_x64 arch/x86/entry/common.c:55 [inline]
    do_syscall_64+0x51/0xba arch/x86/entry/common.c:93
    entry_SYSCALL_64_after_hwframe+0x61/0xcb

    -> #2 (rfkill_global_mutex){+.+.}-{3:3}:
    __mutex_lock_common+0x1b6/0x1bc2 kernel/locking/mutex.c:599
    __mutex_lock kernel/locking/mutex.c:732 [inline]
    mutex_lock_nested+0x17/0x1c kernel/locking/mutex.c:784
    rfkill_register+0x30/0x7e3 net/rfkill/core.c:1045
    hci_register_dev+0x48f/0x96d net/bluetooth/hci_core.c:2622
    __vhci_create_device drivers/bluetooth/hci_vhci.c:341 [inline]
    vhci_create_device+0x3ad/0x68f drivers/bluetooth/hci_vhci.c:374
    vhci_get_user drivers/bluetooth/hci_vhci.c:431 [inline]
    vhci_write+0x37b/0x429 drivers/bluetooth/hci_vhci.c:511
    call_write_iter include/linux/fs.h:2109 [inline]
    new_sync_write fs/read_write.c:509 [inline]
    vfs_write+0xaa8/0xcf5 fs/read_write.c:596
    ksys_write+0x19b/0x2bd fs/read_write.c:650
    do_syscall_x64 arch/x86/entry/common.c:55 [inline]
    do_syscall_64+0x51/0xba arch/x86/entry/common.c:93
    entry_SYSCALL_64_after_hwframe+0x61/0xcb

    -> #1 (&data->open_mutex){+.+.}-{3:3}:
    __mutex_lock_common+0x1b6/0x1bc2 kernel/locking/mutex.c:599
    __mutex_lock kernel/locking/mutex.c:732 [inline]
    mutex_lock_nested+0x17/0x1c kernel/locking/mutex.c:784
    vhci_send_frame+0x68/0x9c drivers/bluetooth/hci_vhci.c:75
    hci_send_frame+0x1cc/0x2ff net/bluetooth/hci_core.c:2989
    hci_sched_acl_pkt net/bluetooth/hci_core.c:3498 [inline]
    hci_sched_acl net/bluetooth/hci_core.c:3583 [inline]
    hci_tx_work+0xb94/0x1a60 net/bluetooth/hci_core.c:3654
    process_one_work+0x901/0xfb8 kernel/workqueue.c:2310
    worker_thread+0xa67/0x1003 kernel/workqueue.c:2457
    kthread+0x36a/0x430 kernel/kthread.c:319
    ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:298

    -> #0 ((work_completion)(&hdev->tx_work)){+.+.}-{0:0}:
    check_prev_add kernel/locking/lockdep.c:3053 [inline]
    check_prevs_add kernel/locking/lockdep.c:3172 [inline]
    validate_chain kernel/locking/lockdep.c:3787 [inline]
    __lock_acquire+0x2d32/0x77fa kernel/locking/lockdep.c:5011
    lock_acquire+0x273/0x4d5 kernel/locking/lockdep.c:5622
    __flush_work+0xee/0x19f kernel/workqueue.c:3090
    hci_dev_close_sync+0x32f/0x1113 net/bluetooth/hci_sync.c:4352
    hci_dev_do_close+0x47/0x9f net/bluetooth/hci_core.c:553
    hci_rfkill_set_block+0x130/0x1ac net/bluetooth/hci_core.c:935
    rfkill_set_block+0x1e6/0x3b8 net/rfkill/core.c:345
    rfkill_fop_write+0x2d8/0x672 net/rfkill/core.c:1274
    vfs_write+0x277/0xcf5 fs/read_write.c:594
    ksys_write+0x19b/0x2bd fs/read_write.c:650
    do_syscall_x64 arch/x86/entry/common.c:55 [inline]
    do_syscall_64+0x51/0xba arch/x86/entry/common.c:93
    entry_SYSCALL_64_after_hwframe+0x61/0xcb

This change removes the need for acquiring the open_mutex in
vhci_send_frame, thus eliminating the potential deadlock while
maintaining the required packet ordering.

Fixes: 92d4abd ("Bluetooth: vhci: Fix race when opening vhci device")
Signed-off-by: Ying Hsu <[email protected]>
Signed-off-by: Luiz Augusto von Dentz <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Nov 20, 2023
Andrii Nakryiko says:

====================
BPF verifier log improvements

This patch set moves a big chunk of verifier log related code from gigantic
verifier.c file into more focused kernel/bpf/log.c. This is not essential to
the rest of functionality in this patch set, so I can undo it, but it felt
like it's good to start chipping away from 20K+ verifier.c whenever we can.

The main purpose of the patch set, though, is in improving verifier log
further.

Patches #3-#4 start printing out register state even if that register is
spilled into stack slot. Previously we'd get only spilled register type, but
no additional information, like SCALAR_VALUE's ranges. Super limiting during
debugging. For cases of register spills smaller than 8 bytes, we also print
out STACK_MISC/STACK_ZERO/STACK_INVALID markers. This, among other things,
will make it easier to write tests for these mixed spill/misc cases.

Patch #5 prints map name for PTR_TO_MAP_VALUE/PTR_TO_MAP_KEY/CONST_PTR_TO_MAP
registers. In big production BPF programs, it's important to map assembly to
actual map, and it's often non-trivial. Having map name helps.

Patch #6 just removes visual noise in form of ubiquitous imm=0 and off=0. They
are default values, omit them.

Patch #7 is probably the most controversial, but it reworks how verifier log
prints numbers. For small valued integers we use decimals, but for large ones
we switch to hexadecimal. From personal experience this is a much more useful
convention. We can tune what consitutes "small value", for now it's 16-bit
range.

Patch #8 prints frame number for PTR_TO_CTX registers, if that frame is
different from the "current" one. This removes ambiguity and confusion,
especially in complicated cases with multiple subprogs passing around
pointers.

v2->v3:
  - adjust reg_bounds tester to parse hex form of reg state as well;
  - print reg->range as unsigned (Alexei);
v1->v2:
  - use verbose_snum() for range and offset in register state (Eduard);
  - fixed typos and added acks from Eduard and Stanislav.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Nov 23, 2023
…f-times'

Eduard Zingerman says:

====================
verify callbacks as if they are called unknown number of times

This series updates verifier logic for callback functions handling.
Current master simulates callback body execution exactly once,
which leads to verifier not detecting unsafe programs like below:

    static int unsafe_on_zero_iter_cb(__u32 idx, struct num_context *ctx)
    {
        ctx->i = 0;
        return 0;
    }

    SEC("?raw_tp")
    int unsafe_on_zero_iter(void *unused)
    {
        struct num_context loop_ctx = { .i = 32 };
        __u8 choice_arr[2] = { 0, 1 };

        bpf_loop(100, unsafe_on_zero_iter_cb, &loop_ctx, 0);
        return choice_arr[loop_ctx.i];
    }

This was reported previously in [0].
The basic idea of the fix is to schedule callback entry state for
verification in env->head until some identical, previously visited
state in current DFS state traversal is found. Same logic as with open
coded iterators, and builds on top recent fixes [1] for those.

The series is structured as follows:
- patches #1,2,3 update strobemeta, xdp_synproxy selftests and
  bpf_loop_bench benchmark to allow convergence of the bpf_loop
  callback states;
- patches #4,5 just shuffle the code a bit;
- patch #6 is the main part of the series;
- patch #7 adds test cases for #6;
- patch #8 extend patch #6 with same speculative scalar widening
  logic, as used for open coded iterators;
- patch #9 adds test cases for #8;
- patch #10 extends patch #6 to track maximal number of callback
  executions specifically for bpf_loop();
- patch #11 adds test cases for #10.

Veristat results comparing this series to master+patches #1,2,3 using selftests
show the following difference:

File                       Program        States (A)  States (B)  States (DIFF)
-------------------------  -------------  ----------  ----------  -------------
bpf_loop_bench.bpf.o       benchmark               1           2  +1 (+100.00%)
pyperf600_bpf_loop.bpf.o   on_event              322         407  +85 (+26.40%)
strobemeta_bpf_loop.bpf.o  on_event              113         151  +38 (+33.63%)
xdp_synproxy_kern.bpf.o    syncookie_tc          341         291  -50 (-14.66%)
xdp_synproxy_kern.bpf.o    syncookie_xdp         344         301  -43 (-12.50%)

Veristat results comparing this series to master using Tetragon BPF
files [2] also show some differences.
States diff varies from +2% to +15% on 23 programs out of 186,
no new failures.

Changelog:
- V3 [5] -> V4, changes suggested by Andrii:
  - validate mark_chain_precision() result in patch #10;
  - renaming s/cumulative_callback_depth/callback_unroll_depth/.
- V2 [4] -> V3:
  - fixes in expected log messages for test cases:
    - callback_result_precise;
    - parent_callee_saved_reg_precise_with_callback;
    - parent_stack_slot_precise_with_callback;
  - renamings (suggested by Alexei):
    - s/callback_iter_depth/cumulative_callback_depth/
    - s/is_callback_iter_next/calls_callback/
    - s/mark_callback_iter_next/mark_calls_callback/
  - prepare_func_exit() updated to exit with -EFAULT when
    callee->in_callback_fn is true but calls_callback() is not true
    for callsite;
  - test case 'bpf_loop_iter_limit_nested' rewritten to use return
    value check instead of verifier log message checks
    (suggested by Alexei).
- V1 [3] -> V2, changes suggested by Andrii:
  - small changes for error handling code in __check_func_call();
  - callback body processing log is now matched in relevant
    verifier_subprog_precision.c tests;
  - R1 passed to bpf_loop() is now always marked as precise;
  - log level 2 message for bpf_loop() iteration termination instead of
    iteration depth messages;
  - __no_msg macro removed;
  - bpf_loop_iter_limit_nested updated to avoid using __no_msg;
  - commit message for patch #3 updated according to Alexei's request.

[0] https://lore.kernel.org/bpf/CA+vRuzPChFNXmouzGG+wsy=6eMcfr1mFG0F3g7rbg-sedGKW3w@mail.gmail.com/
[1] https://lore.kernel.org/bpf/[email protected]/
[2] [email protected]:cilium/tetragon.git
[3] https://lore.kernel.org/bpf/[email protected]/T/#t
[4] https://lore.kernel.org/bpf/[email protected]/T/#t
[5] https://lore.kernel.org/bpf/[email protected]/T/#t
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Nov 23, 2023
Petr Machata says:

====================
mlxsw: Preparations for support of CFF flood mode

PGT is an in-HW table that maps addresses to sets of ports. Then when some
HW process needs a set of ports as an argument, instead of embedding the
actual set in the dynamic configuration, what gets configured is the
address referencing the set. The HW then works with the appropriate PGT
entry.

Among other allocations, the PGT currently contains two large blocks for
bridge flooding: one for 802.1q and one for 802.1d. Within each of these
blocks are three tables, for unknown-unicast, multicast and broadcast
flooding:

      . . . |    802.1q    |    802.1d    | . . .
            | UC | MC | BC | UC | MC | BC |
             \______ _____/ \_____ ______/
                    v             v
                   FID flood vectors

Thus each FID (which corresponds to an 802.1d bridge or one VLAN in an
802.1q bridge) uses three flood vectors spread across a fairly large region
of PGT.

This way of organizing the flood table (called "controlled") is not very
flexible. E.g. to decrease a bridge scale and store more IP MC vectors, one
would need to completely rewrite the bridge PGT blocks, or resort to hacks
such as storing individual MC flood vectors into unused part of the bridge
table.

In order to address these shortcomings, Spectrum-2 and above support what
is called CFF flood mode, for Compressed FID Flooding. In CFF flood mode,
each FID has a little table of its own, with three entries adjacent to each
other, one for unknown-UC, one for MC, one for BC. This allows for a much
more fine-grained approach to PGT management, where bits of it are
allocated on demand.

      . . . | FID | FID | FID | FID | FID | . . .
            |U|M|B|U|M|B|U|M|B|U|M|B|U|M|B|
             \_____________ _____________/
                           v
                   FID flood vectors

Besides the FID table organization, the CFF flood mode also impacts Router
Subport (RSP) table. This table contains flood vectors for rFIDs, which are
FIDs that reference front panel ports or LAGs. The RSP table contains two
entries per front panel port and LAG, one for unknown-UC traffic, and one
for everything else. Currently, the FW allocates and manages the table in
its own part of PGT. rFIDs are marked with flood_rsp bit and managed
specially. In CFF mode, rFIDs are managed as all other FIDs. The driver
therefore has to allocate and maintain the flood vectors. Like with bridge
FIDs, this is more work, but increases flexibility of the system.

The FW currently supports both the controlled and CFF flood modes. To shed
complexity, in the future it should only support CFF flood mode. Hence this
patchset, which is the first in series of two to add CFF flood mode support
to mlxsw.

There are FW versions out there that do not support CFF flood mode, and on
Spectrum-1 in particular, there is no plan to support it at all. mlxsw will
therefore have to support both controlled flood mode as well as CFF.

Another aspect is that at least on Spectrum-1, there are FW versions out
there that claim to support CFF flood mode, but then reject or ignore
configurations enabling the same. The driver thus has to have a say in
whether an attempt to configure CFF flood mode should even be made.

Much like with the LAG mode, the feature is therefore expressed in terms of
"does the driver prefer CFF flood mode?", and "what flood mode the PCI
module managed to configure the FW with". This gives to the driver a chance
to determine whether CFF flood mode configuration should be attempted.

In this patchset, we lay the ground with new definitions, registers and
their fields, and some minor code shaping. The next patchset will be more
focused on introducing necessary abstractions and implementation.

- Patches #1 and #2 add CFF-related items to the command interface.

- Patch #3 adds a new resource, for maximum number of flood profiles
  supported. (A flood profile is a mapping between traffic type and offset
  in the per-FID flood vector table.)

- Patches #4 to #8 adjust reg.h. The SFFP register is added, which is used
  for configuring the abovementioned traffic-type-to-offset mapping. The
  SFMR, register, which serves for FID configuration, is extended with
  fields specific to CFF mode. And other minor adjustments.

- Patches #9 and #10 add the plumbing for CFF mode: a way to request that
  CFF flood mode be configured, and a way to query the flood mode that was
  actually configured.

- Patch #11 removes dead code.

- Patches #12 and #13 add helpers that the next patchset will make use of.
  Patch #14 moves RIF setup ahead so that FID code can make use of it.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Dec 4, 2023
When scanning namespaces, it is possible to get valid data from the first
call to nvme_identify_ns() in nvme_alloc_ns(), but not from the second
call in nvme_update_ns_info_block().  In particular, if the NSID becomes
inactive between the two commands, a storage device may return a buffer
filled with zero as per 4.1.5.1.  In this case, we can get a kernel crash
due to a divide-by-zero in blk_stack_limits() because ns->lba_shift will
be set to zero.

PID: 326      TASK: ffff95fec3cd8000  CPU: 29   COMMAND: "kworker/u98:10"
 #0 [ffffad8f8702f9e0] machine_kexec at ffffffff91c76ec7
 #1 [ffffad8f8702fa38] __crash_kexec at ffffffff91dea4fa
 #2 [ffffad8f8702faf8] crash_kexec at ffffffff91deb788
 #3 [ffffad8f8702fb00] oops_end at ffffffff91c2e4bb
 #4 [ffffad8f8702fb20] do_trap at ffffffff91c2a4ce
 #5 [ffffad8f8702fb70] do_error_trap at ffffffff91c2a595
 #6 [ffffad8f8702fbb0] exc_divide_error at ffffffff928506e6
 #7 [ffffad8f8702fbd0] asm_exc_divide_error at ffffffff92a00926
    [exception RIP: blk_stack_limits+434]
    RIP: ffffffff92191872  RSP: ffffad8f8702fc80  RFLAGS: 00010246
    RAX: 0000000000000000  RBX: ffff95efa0c91800  RCX: 0000000000000001
    RDX: 0000000000000000  RSI: 0000000000000001  RDI: 0000000000000001
    RBP: 00000000ffffffff   R8: ffff95fec7df35a8   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000001  R12: 0000000000000000
    R13: 0000000000000000  R14: 0000000000000000  R15: ffff95fed33c09a8
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #8 [ffffad8f8702fce0] nvme_update_ns_info_block at ffffffffc06d3533 [nvme_core]
 #9 [ffffad8f8702fd18] nvme_scan_ns at ffffffffc06d6fa7 [nvme_core]

This happened when the check for valid data was moved out of nvme_identify_ns()
into one of the callers.  Fix this by checking in both callers.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=218186
Fixes: 0dd6fff ("nvme: bring back auto-removal of deleted namespaces during sequential scan")
Cc: [email protected]
Signed-off-by: Ewan D. Milne <[email protected]>
Signed-off-by: Keith Busch <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Dec 4, 2023
Currently DP driver is executed independent of PM runtime framework.
This leads msm eDP panel can not being detected by edp_panel driver
during generic_edp_panel_probe() due to AUX DPCD read failed at
edp panel driver. Incorporate PM runtime framework into DP driver so
that host controller's power and clocks are enable/disable through
PM runtime mechanism.  Once PM runtime framework is incorporated into
DP driver, waking up device from power up path is not necessary. Hence
remove it.

After incorporating pm_runtime framework into eDP/DP driver,
dp_pm_suspend() to handle power off both DP phy and controller during
suspend and dp_pm_resume() to handle power on both DP phy and controller
during resume are not necessary. Therefore both dp_pm_suspend() and
dp_pm_resume() are dropped and replace with dp_pm_runtime_suspend() and
dp_pm_runtime_resume() respectively.

Changes in v9:
-- silent compiler warning message at dp_power_init() and dp_power_deinit()
   with W1 flag

Changes in v7:
-- add comments to dp_pm_runtime_resume()
-- add comments to dp_bridge_hpd_enable()
-- delete dp->hpd_state = ST_DISCONNECTED from dp_bridge_hpd_notify()

Changes in v6:
-- delete dp_power_client_deinit(dp->power);
-- remove if (!dp->dp_display.is_edp) condition checkout at plug_handle()
-- remove if (!dp->dp_display.is_edp) condition checkout at unplug_handle()
-- add IRQF_NO_AUTOEN to devm_request_irq()
-- add enable_irq() and disable_irq() to pm_runtime_resume()/suspend()
-- del dp->hpd_state = ST_DISCONNECTED from dp_bridge_hpd_disable()

Changes in v5:
-- remove pm_runtime_put_autosuspend feature, use pm_runtime_put_sync()
-- squash add pm_runtime_force_suspend()/resume() patch into this patch

Changes in v4:
-- reworded commit text to explain why pm_framework is required for
   edp panel
-- reworded commit text to explain autosuspend is choiced
-- delete EV_POWER_PM_GET and PM_EV_POWER_PUT from changes #3
-- delete dp_display_pm_get() and dp_display_pm_Put() from changes #3
-- return value from pm_runtime_resume_and_get() directly
-- check return value of devm_pm_runtime_enable()
-- delete pm_runtime_xxx from dp_display_remove()
-- drop dp_display_host_init() from EV_HPD_INIT_SETUP
-- drop both dp_pm_prepare() and dp_pm_compete() from this change
-- delete ST_SUSPENDED state
-- rewording commit text to add more details regrading the purpose
   of this change

Changes in v3:
-- incorporate removing pm_runtime_xx() from dp_pwer.c to this patch
-- use pm_runtime_resume_and_get() instead of pm_runtime_get()
-- error checking pm_runtime_resume_and_get() return value
-- add EV_POWER_PM_GET and PM_EV_POWER_PUT to handle HPD_GPIO case
-- replace dp_pm_suspend() with pm_runtime_force_suspend()
-- replace dp_pm_resume() with pm_runtime_force_resume()

Signed-off-by: Kuogee Hsieh <[email protected]>
Reviewed-by: Dmitry Baryshkov <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/570073/
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dmitry Baryshkov <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Dec 13, 2023
Patch series "Multi-size THP for anonymous memory", v9.

A series to implement multi-size THP (mTHP) for anonymous memory
(previously called "small-sized THP" and "large anonymous folios").

The objective of this is to improve performance by allocating larger
chunks of memory during anonymous page faults:

1) Since SW (the kernel) is dealing with larger chunks of memory than base
   pages, there are efficiency savings to be had; fewer page faults, batched PTE
   and RMAP manipulation, reduced lru list, etc. In short, we reduce kernel
   overhead. This should benefit all architectures.
2) Since we are now mapping physically contiguous chunks of memory, we can take
   advantage of HW TLB compression techniques. A reduction in TLB pressure
   speeds up kernel and user space. arm64 systems have 2 mechanisms to coalesce
   TLB entries; "the contiguous bit" (architectural) and HPA (uarch).

This version incorporates David's feedback on the core patches (#3, #4)
and adds some RB and TB tags (see change log for details).

By default, the existing behaviour (and performance) is maintained.  The
user must explicitly enable multi-size THP to see the performance benefit.
This is done via a new sysfs interface (as recommended by David
Hildenbrand - thanks to David for the suggestion)!  This interface is
inspired by the existing per-hugepage-size sysfs interface used by
hugetlb, provides full backwards compatibility with the existing PMD-size
THP interface, and provides a base for future extensibility.  See [9] for
detailed discussion of the interface.

This series is based on mm-unstable (715b67adf4c8).


Prerequisites
=============

I'm removing this section on the basis that I don't believe what we were
previously calling prerequisites are really prerequisites anymore.  We
originally defined them when mTHP was a compile-time feature.  There is
now a runtime control to opt-in to mTHP; when disabled, correctness and
performance are as before.  When enabled, the code is still
correct/robust, but in the absence of the one remaining item (compaction)
there may be a performance impact in some corners.  See the old list in
the v8 cover letter at [8].  And a longer explanation of my thinking here
[10].

SUMMARY: I don't think we should hold this series up, waiting for the
items on the prerequisites list.  I believe this series should be ready
now so hopefully can be added to mm-unstable for some testing, then
fingers crossed for v6.8.


Testing
=======

The series includes patches for mm selftests to enlighten the cow and
khugepaged tests to explicitly test with multi-size THP, in the same way
that PMD-sized THP is tested.  The new tests all pass, and no regressions
are observed in the mm selftest suite.  I've also run my usual kernel
compilation and java script benchmarks without any issues.

Refer to my performance numbers posted with v6 [6].  (These are for
multi-size THP only - they do not include the arm64 contpte follow-on
series).

John Hubbard at Nvidia has indicated dramatic 10x performance improvements
for some workloads at [11].  (Observed using v6 of this series as well as
the arm64 contpte series).

Kefeng Wang at Huawei has also indicated he sees improvements at [12] although
there are some latency regressions also.

I've also checked that there is no regression in the write fault path when
mTHP is disabled using a microbenchmark.  I ran it for a baseline kernel,
as well as v8 and v9.  I repeated on Ampere Altra (bare metal) and Apple
M2 (VM):

|              |        m2 vm        |        altra        |
|--------------|---------------------|---------------------|
| kernel       |     mean |  std_rel |     mean |  std_rel |
|--------------|----------|----------|----------|----------|
| baseline     |   0.000% |   0.341% |   0.000% |   3.581% |
| anonfolio-v8 |   0.005% |   0.272% |   5.068% |   1.128% |
| anonfolio-v9 |  -0.013% |   0.442% |   0.107% |   1.788% |

There is no measurable difference on M2, but altra has a slow down in v8
which is fixed in v9 by moving the THP order check to be inline within
thp_vma_allowable_orders(), as suggested by David.


This patch (of 10):

In preparation for the introduction of anonymous multi-size THP, we would
like to be able to split them when they have unmapped subpages, in order
to free those unused pages under memory pressure.  So remove the
artificial requirement that the large folio needed to be at least
PMD-sized.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ryan Roberts <[email protected]>
Reviewed-by: Yu Zhao <[email protected]>
Reviewed-by: Yin Fengwei <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Barry Song <[email protected]>
Tested-by: Kefeng Wang <[email protected]>
Tested-by: John Hubbard <[email protected]>
Cc: Alistair Popple <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: "Huang, Ying" <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Itaru Kitayama <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Dec 15, 2023
Releasing the `priv->lock` while iterating the `priv->multicast_list` in
`ipoib_mcast_join_task()` opens a window for `ipoib_mcast_dev_flush()` to
remove the items while in the middle of iteration. If the mcast is removed
while the lock was dropped, the for loop spins forever resulting in a hard
lockup (as was reported on RHEL 4.18.0-372.75.1.el8_6 kernel):

    Task A (kworker/u72:2 below)       | Task B (kworker/u72:0 below)
    -----------------------------------+-----------------------------------
    ipoib_mcast_join_task(work)        | ipoib_ib_dev_flush_light(work)
      spin_lock_irq(&priv->lock)       | __ipoib_ib_dev_flush(priv, ...)
      list_for_each_entry(mcast,       | ipoib_mcast_dev_flush(dev = priv->dev)
          &priv->multicast_list, list) |
        ipoib_mcast_join(dev, mcast)   |
          spin_unlock_irq(&priv->lock) |
                                       |   spin_lock_irqsave(&priv->lock, flags)
                                       |   list_for_each_entry_safe(mcast, tmcast,
                                       |                  &priv->multicast_list, list)
                                       |     list_del(&mcast->list);
                                       |     list_add_tail(&mcast->list, &remove_list)
                                       |   spin_unlock_irqrestore(&priv->lock, flags)
          spin_lock_irq(&priv->lock)   |
                                       |   ipoib_mcast_remove_list(&remove_list)
   (Here, `mcast` is no longer on the  |     list_for_each_entry_safe(mcast, tmcast,
    `priv->multicast_list` and we keep |                            remove_list, list)
    spinning on the `remove_list` of   |  >>>  wait_for_completion(&mcast->done)
    the other thread which is blocked  |
    and the list is still valid on     |
    it's stack.)

Fix this by keeping the lock held and changing to GFP_ATOMIC to prevent
eventual sleeps.
Unfortunately we could not reproduce the lockup and confirm this fix but
based on the code review I think this fix should address such lockups.

crash> bc 31
PID: 747      TASK: ff1c6a1a007e8000  CPU: 31   COMMAND: "kworker/u72:2"
--
    [exception RIP: ipoib_mcast_join_task+0x1b1]
    RIP: ffffffffc0944ac1  RSP: ff646f199a8c7e00  RFLAGS: 00000002
    RAX: 0000000000000000  RBX: ff1c6a1a04dc82f8  RCX: 0000000000000000
                                  work (&priv->mcast_task{,.work})
    RDX: ff1c6a192d60ac68  RSI: 0000000000000286  RDI: ff1c6a1a04dc8000
           &mcast->list
    RBP: ff646f199a8c7e90   R8: ff1c699980019420   R9: ff1c6a1920c9a000
    R10: ff646f199a8c7e00  R11: ff1c6a191a7d9800  R12: ff1c6a192d60ac00
                                                         mcast
    R13: ff1c6a1d82200000  R14: ff1c6a1a04dc8000  R15: ff1c6a1a04dc82d8
           dev                    priv (&priv->lock)     &priv->multicast_list (aka head)
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <NMI exception stack> ---
 #5 [ff646f199a8c7e00] ipoib_mcast_join_task+0x1b1 at ffffffffc0944ac1 [ib_ipoib]
 #6 [ff646f199a8c7e98] process_one_work+0x1a7 at ffffffff9bf10967

crash> rx ff646f199a8c7e68
ff646f199a8c7e68:  ff1c6a1a04dc82f8 <<< work = &priv->mcast_task.work

crash> list -hO ipoib_dev_priv.multicast_list ff1c6a1a04dc8000
(empty)

crash> ipoib_dev_priv.mcast_task.work.func,mcast_mutex.owner.counter ff1c6a1a04dc8000
  mcast_task.work.func = 0xffffffffc0944910 <ipoib_mcast_join_task>,
  mcast_mutex.owner.counter = 0xff1c69998efec000

crash> b 8
PID: 8        TASK: ff1c69998efec000  CPU: 33   COMMAND: "kworker/u72:0"
--
 #3 [ff646f1980153d50] wait_for_completion+0x96 at ffffffff9c7d7646
 #4 [ff646f1980153d90] ipoib_mcast_remove_list+0x56 at ffffffffc0944dc6 [ib_ipoib]
 #5 [ff646f1980153de8] ipoib_mcast_dev_flush+0x1a7 at ffffffffc09455a7 [ib_ipoib]
 #6 [ff646f1980153e58] __ipoib_ib_dev_flush+0x1a4 at ffffffffc09431a4 [ib_ipoib]
 #7 [ff646f1980153e98] process_one_work+0x1a7 at ffffffff9bf10967

crash> rx ff646f1980153e68
ff646f1980153e68:  ff1c6a1a04dc83f0 <<< work = &priv->flush_light

crash> ipoib_dev_priv.flush_light.func,broadcast ff1c6a1a04dc8000
  flush_light.func = 0xffffffffc0943820 <ipoib_ib_dev_flush_light>,
  broadcast = 0x0,

The mcast(s) on the `remove_list` (the remaining part of the ex `priv->multicast_list`):

crash> list -s ipoib_mcast.done.done ipoib_mcast.list -H ff646f1980153e10 | paste - -
ff1c6a192bd0c200          done.done = 0x0,
ff1c6a192d60ac00          done.done = 0x0,

Reported-by: Yuya Fujita-bishamonten <[email protected]>
Signed-off-by: Daniel Vacek <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Dec 15, 2023
virtio-gpu kernel driver reports 16 for count_crtcs
which exceeds IGT_MAX_PIPES set to 8 in igt-gpu-tools.
This results in below memory corruption,

 malloc(): corrupted top size
 Received signal SIGABRT.
 Stack trace:
  #0 [fatal_sig_handler+0x17b]
  #1 [__sigaction+0x40]
  #2 [pthread_key_delete+0x14c]
  #3 [gsignal+0x12]
  #4 [abort+0xd3]
  #5 [__fsetlocking+0x290]
  #6 [timer_settime+0x37a]
  #7 [__default_morecore+0x1f1b]
  #8 [__libc_calloc+0x161]
  #9 [drmModeGetPlaneResources+0x44]
  #10 [igt_display_require+0x194]
  #11 [__igt_unique____real_main1356+0x93c]
  #12 [main+0x3f]
  #13 [__libc_init_first+0x8a]
  #14 [__libc_start_main+0x85]
  #15 [_start+0x21]
 
This is fixed in igt-gpu-tools by increasing IGT_MAX_PIPES to 16.  
https://patchwork.freedesktop.org/series/126327/
 
Uprev IGT to include the patches which fixes this issue.

Acked-by: Helen Koike <[email protected]>
Signed-off-by: Vignesh Raman <[email protected]>
Signed-off-by: Helen Koike <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
ColinIanKing pushed a commit that referenced this pull request Dec 15, 2023
Patch series "Multi-size THP for anonymous memory", v9.

A series to implement multi-size THP (mTHP) for anonymous memory
(previously called "small-sized THP" and "large anonymous folios").

The objective of this is to improve performance by allocating larger
chunks of memory during anonymous page faults:

1) Since SW (the kernel) is dealing with larger chunks of memory than base
   pages, there are efficiency savings to be had; fewer page faults, batched PTE
   and RMAP manipulation, reduced lru list, etc. In short, we reduce kernel
   overhead. This should benefit all architectures.
2) Since we are now mapping physically contiguous chunks of memory, we can take
   advantage of HW TLB compression techniques. A reduction in TLB pressure
   speeds up kernel and user space. arm64 systems have 2 mechanisms to coalesce
   TLB entries; "the contiguous bit" (architectural) and HPA (uarch).

This version incorporates David's feedback on the core patches (#3, #4)
and adds some RB and TB tags (see change log for details).

By default, the existing behaviour (and performance) is maintained.  The
user must explicitly enable multi-size THP to see the performance benefit.
This is done via a new sysfs interface (as recommended by David
Hildenbrand - thanks to David for the suggestion)!  This interface is
inspired by the existing per-hugepage-size sysfs interface used by
hugetlb, provides full backwards compatibility with the existing PMD-size
THP interface, and provides a base for future extensibility.  See [9] for
detailed discussion of the interface.

This series is based on mm-unstable (715b67adf4c8).


Prerequisites
=============

I'm removing this section on the basis that I don't believe what we were
previously calling prerequisites are really prerequisites anymore.  We
originally defined them when mTHP was a compile-time feature.  There is
now a runtime control to opt-in to mTHP; when disabled, correctness and
performance are as before.  When enabled, the code is still
correct/robust, but in the absence of the one remaining item (compaction)
there may be a performance impact in some corners.  See the old list in
the v8 cover letter at [8].  And a longer explanation of my thinking here
[10].

SUMMARY: I don't think we should hold this series up, waiting for the
items on the prerequisites list.  I believe this series should be ready
now so hopefully can be added to mm-unstable for some testing, then
fingers crossed for v6.8.


Testing
=======

The series includes patches for mm selftests to enlighten the cow and
khugepaged tests to explicitly test with multi-size THP, in the same way
that PMD-sized THP is tested.  The new tests all pass, and no regressions
are observed in the mm selftest suite.  I've also run my usual kernel
compilation and java script benchmarks without any issues.

Refer to my performance numbers posted with v6 [6].  (These are for
multi-size THP only - they do not include the arm64 contpte follow-on
series).

John Hubbard at Nvidia has indicated dramatic 10x performance improvements
for some workloads at [11].  (Observed using v6 of this series as well as
the arm64 contpte series).

Kefeng Wang at Huawei has also indicated he sees improvements at [12] although
there are some latency regressions also.

I've also checked that there is no regression in the write fault path when
mTHP is disabled using a microbenchmark.  I ran it for a baseline kernel,
as well as v8 and v9.  I repeated on Ampere Altra (bare metal) and Apple
M2 (VM):

|              |        m2 vm        |        altra        |
|--------------|---------------------|---------------------|
| kernel       |     mean |  std_rel |     mean |  std_rel |
|--------------|----------|----------|----------|----------|
| baseline     |   0.000% |   0.341% |   0.000% |   3.581% |
| anonfolio-v8 |   0.005% |   0.272% |   5.068% |   1.128% |
| anonfolio-v9 |  -0.013% |   0.442% |   0.107% |   1.788% |

There is no measurable difference on M2, but altra has a slow down in v8
which is fixed in v9 by moving the THP order check to be inline within
thp_vma_allowable_orders(), as suggested by David.


This patch (of 10):

In preparation for the introduction of anonymous multi-size THP, we would
like to be able to split them when they have unmapped subpages, in order
to free those unused pages under memory pressure.  So remove the
artificial requirement that the large folio needed to be at least
PMD-sized.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ryan Roberts <[email protected]>
Reviewed-by: Yu Zhao <[email protected]>
Reviewed-by: Yin Fengwei <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Barry Song <[email protected]>
Tested-by: Kefeng Wang <[email protected]>
Tested-by: John Hubbard <[email protected]>
Cc: Alistair Popple <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: "Huang, Ying" <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Itaru Kitayama <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Dec 19, 2023
Petr Machata says:

====================
mlxsw: CFF flood mode: NVE underlay configuration

Recently, support for CFF flood mode (for Compressed FID Flooding) was
added to the mlxsw driver. The most recent patchset has a detailed coverage
of what CFF is and what has changed and how:

    https://lore.kernel.org/netdev/[email protected]/

In CFF flood mode, each FID allocates a handful (in our implementation two
or three) consecutive PGT entries. One entry holds the flood vector for
unknown-UC traffic, one for MC, one for BC.

To determine how to look up flood vectors, the CFF flood mode uses a
concept of flood profiles, which are IDs that reference mappings from
traffic types to offsets. In the case of CFF flood mode, the offset in
question is applied to the PGT address configured at a FID. The same
mechanism is used by NVE underlay for flooding. Again the profile ID and
the traffic type determine the offset to apply, this time to KVD address
used to look up flooding entries. Since mlxsw configures NVE underlay flood
the same regardless of traffic type, only one offset was ever needed: the
zero, which is the default, and thus no explicit configuration was needed.

Now that CFF uses profiles as well, it would be better to configure the
profile used by NVE explicitly, to make the configuration visible in the
source code.

In this patchset, add the register support (in patch #1), add a new traffic
type to refer to "any traffic at all" (in patch #2) and finally configure
the NVE profile explicitly for FIDs (in patch #3).

So far, the implicitly configured flood profile was the ID 0. With this
patchset, it changes to 3, leaving the 0 free to allow us to spot missed
configuration.
====================

Signed-off-by: David S. Miller <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Dec 19, 2023
syzbot found a potential circular dependency leading to a deadlock:
    -> #3 (&hdev->req_lock){+.+.}-{3:3}:
    __mutex_lock_common+0x1b6/0x1bc2 kernel/locking/mutex.c:599
    __mutex_lock kernel/locking/mutex.c:732 [inline]
    mutex_lock_nested+0x17/0x1c kernel/locking/mutex.c:784
    hci_dev_do_close+0x3f/0x9f net/bluetooth/hci_core.c:551
    hci_rfkill_set_block+0x130/0x1ac net/bluetooth/hci_core.c:935
    rfkill_set_block+0x1e6/0x3b8 net/rfkill/core.c:345
    rfkill_fop_write+0x2d8/0x672 net/rfkill/core.c:1274
    vfs_write+0x277/0xcf5 fs/read_write.c:594
    ksys_write+0x19b/0x2bd fs/read_write.c:650
    do_syscall_x64 arch/x86/entry/common.c:55 [inline]
    do_syscall_64+0x51/0xba arch/x86/entry/common.c:93
    entry_SYSCALL_64_after_hwframe+0x61/0xcb

    -> #2 (rfkill_global_mutex){+.+.}-{3:3}:
    __mutex_lock_common+0x1b6/0x1bc2 kernel/locking/mutex.c:599
    __mutex_lock kernel/locking/mutex.c:732 [inline]
    mutex_lock_nested+0x17/0x1c kernel/locking/mutex.c:784
    rfkill_register+0x30/0x7e3 net/rfkill/core.c:1045
    hci_register_dev+0x48f/0x96d net/bluetooth/hci_core.c:2622
    __vhci_create_device drivers/bluetooth/hci_vhci.c:341 [inline]
    vhci_create_device+0x3ad/0x68f drivers/bluetooth/hci_vhci.c:374
    vhci_get_user drivers/bluetooth/hci_vhci.c:431 [inline]
    vhci_write+0x37b/0x429 drivers/bluetooth/hci_vhci.c:511
    call_write_iter include/linux/fs.h:2109 [inline]
    new_sync_write fs/read_write.c:509 [inline]
    vfs_write+0xaa8/0xcf5 fs/read_write.c:596
    ksys_write+0x19b/0x2bd fs/read_write.c:650
    do_syscall_x64 arch/x86/entry/common.c:55 [inline]
    do_syscall_64+0x51/0xba arch/x86/entry/common.c:93
    entry_SYSCALL_64_after_hwframe+0x61/0xcb

    -> #1 (&data->open_mutex){+.+.}-{3:3}:
    __mutex_lock_common+0x1b6/0x1bc2 kernel/locking/mutex.c:599
    __mutex_lock kernel/locking/mutex.c:732 [inline]
    mutex_lock_nested+0x17/0x1c kernel/locking/mutex.c:784
    vhci_send_frame+0x68/0x9c drivers/bluetooth/hci_vhci.c:75
    hci_send_frame+0x1cc/0x2ff net/bluetooth/hci_core.c:2989
    hci_sched_acl_pkt net/bluetooth/hci_core.c:3498 [inline]
    hci_sched_acl net/bluetooth/hci_core.c:3583 [inline]
    hci_tx_work+0xb94/0x1a60 net/bluetooth/hci_core.c:3654
    process_one_work+0x901/0xfb8 kernel/workqueue.c:2310
    worker_thread+0xa67/0x1003 kernel/workqueue.c:2457
    kthread+0x36a/0x430 kernel/kthread.c:319
    ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:298

    -> #0 ((work_completion)(&hdev->tx_work)){+.+.}-{0:0}:
    check_prev_add kernel/locking/lockdep.c:3053 [inline]
    check_prevs_add kernel/locking/lockdep.c:3172 [inline]
    validate_chain kernel/locking/lockdep.c:3787 [inline]
    __lock_acquire+0x2d32/0x77fa kernel/locking/lockdep.c:5011
    lock_acquire+0x273/0x4d5 kernel/locking/lockdep.c:5622
    __flush_work+0xee/0x19f kernel/workqueue.c:3090
    hci_dev_close_sync+0x32f/0x1113 net/bluetooth/hci_sync.c:4352
    hci_dev_do_close+0x47/0x9f net/bluetooth/hci_core.c:553
    hci_rfkill_set_block+0x130/0x1ac net/bluetooth/hci_core.c:935
    rfkill_set_block+0x1e6/0x3b8 net/rfkill/core.c:345
    rfkill_fop_write+0x2d8/0x672 net/rfkill/core.c:1274
    vfs_write+0x277/0xcf5 fs/read_write.c:594
    ksys_write+0x19b/0x2bd fs/read_write.c:650
    do_syscall_x64 arch/x86/entry/common.c:55 [inline]
    do_syscall_64+0x51/0xba arch/x86/entry/common.c:93
    entry_SYSCALL_64_after_hwframe+0x61/0xcb

This change removes the need for acquiring the open_mutex in
vhci_send_frame, thus eliminating the potential deadlock while
maintaining the required packet ordering.

Fixes: 92d4abd ("Bluetooth: vhci: Fix race when opening vhci device")
Signed-off-by: Ying Hsu <[email protected]>
Signed-off-by: Luiz Augusto von Dentz <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Dec 19, 2023
Patch series "Multi-size THP for anonymous memory", v9.

A series to implement multi-size THP (mTHP) for anonymous memory
(previously called "small-sized THP" and "large anonymous folios").

The objective of this is to improve performance by allocating larger
chunks of memory during anonymous page faults:

1) Since SW (the kernel) is dealing with larger chunks of memory than base
   pages, there are efficiency savings to be had; fewer page faults, batched PTE
   and RMAP manipulation, reduced lru list, etc. In short, we reduce kernel
   overhead. This should benefit all architectures.
2) Since we are now mapping physically contiguous chunks of memory, we can take
   advantage of HW TLB compression techniques. A reduction in TLB pressure
   speeds up kernel and user space. arm64 systems have 2 mechanisms to coalesce
   TLB entries; "the contiguous bit" (architectural) and HPA (uarch).

This version incorporates David's feedback on the core patches (#3, #4)
and adds some RB and TB tags (see change log for details).

By default, the existing behaviour (and performance) is maintained.  The
user must explicitly enable multi-size THP to see the performance benefit.
This is done via a new sysfs interface (as recommended by David
Hildenbrand - thanks to David for the suggestion)!  This interface is
inspired by the existing per-hugepage-size sysfs interface used by
hugetlb, provides full backwards compatibility with the existing PMD-size
THP interface, and provides a base for future extensibility.  See [9] for
detailed discussion of the interface.

This series is based on mm-unstable (715b67adf4c8).


Prerequisites
=============

I'm removing this section on the basis that I don't believe what we were
previously calling prerequisites are really prerequisites anymore.  We
originally defined them when mTHP was a compile-time feature.  There is
now a runtime control to opt-in to mTHP; when disabled, correctness and
performance are as before.  When enabled, the code is still
correct/robust, but in the absence of the one remaining item (compaction)
there may be a performance impact in some corners.  See the old list in
the v8 cover letter at [8].  And a longer explanation of my thinking here
[10].

SUMMARY: I don't think we should hold this series up, waiting for the
items on the prerequisites list.  I believe this series should be ready
now so hopefully can be added to mm-unstable for some testing, then
fingers crossed for v6.8.


Testing
=======

The series includes patches for mm selftests to enlighten the cow and
khugepaged tests to explicitly test with multi-size THP, in the same way
that PMD-sized THP is tested.  The new tests all pass, and no regressions
are observed in the mm selftest suite.  I've also run my usual kernel
compilation and java script benchmarks without any issues.

Refer to my performance numbers posted with v6 [6].  (These are for
multi-size THP only - they do not include the arm64 contpte follow-on
series).

John Hubbard at Nvidia has indicated dramatic 10x performance improvements
for some workloads at [11].  (Observed using v6 of this series as well as
the arm64 contpte series).

Kefeng Wang at Huawei has also indicated he sees improvements at [12] although
there are some latency regressions also.

I've also checked that there is no regression in the write fault path when
mTHP is disabled using a microbenchmark.  I ran it for a baseline kernel,
as well as v8 and v9.  I repeated on Ampere Altra (bare metal) and Apple
M2 (VM):

|              |        m2 vm        |        altra        |
|--------------|---------------------|---------------------|
| kernel       |     mean |  std_rel |     mean |  std_rel |
|--------------|----------|----------|----------|----------|
| baseline     |   0.000% |   0.341% |   0.000% |   3.581% |
| anonfolio-v8 |   0.005% |   0.272% |   5.068% |   1.128% |
| anonfolio-v9 |  -0.013% |   0.442% |   0.107% |   1.788% |

There is no measurable difference on M2, but altra has a slow down in v8
which is fixed in v9 by moving the THP order check to be inline within
thp_vma_allowable_orders(), as suggested by David.


This patch (of 10):

In preparation for the introduction of anonymous multi-size THP, we would
like to be able to split them when they have unmapped subpages, in order
to free those unused pages under memory pressure.  So remove the
artificial requirement that the large folio needed to be at least
PMD-sized.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ryan Roberts <[email protected]>
Reviewed-by: Yu Zhao <[email protected]>
Reviewed-by: Yin Fengwei <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Barry Song <[email protected]>
Tested-by: Kefeng Wang <[email protected]>
Tested-by: John Hubbard <[email protected]>
Cc: Alistair Popple <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: "Huang, Ying" <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Itaru Kitayama <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Dec 19, 2023
Calling led_trigger_register() when attaching a PHY located on an SFP
module potentially (and practically) leads into a deadlock.
Fix this by not calling led_trigger_register() for PHYs localted on SFP
modules as such modules actually never got any LEDs.

======================================================
WARNING: possible circular locking dependency detected
6.7.0-rc4-next-20231208+ #0 Tainted: G           O
------------------------------------------------------
kworker/u8:2/43 is trying to acquire lock:
ffffffc08108c4e8 (triggers_list_lock){++++}-{3:3}, at: led_trigger_register+0x4c/0x1a8

but task is already holding lock:
ffffff80c5c6f318 (&sfp->sm_mutex){+.+.}-{3:3}, at: cleanup_module+0x2ba8/0x3120 [sfp]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&sfp->sm_mutex){+.+.}-{3:3}:
       __mutex_lock+0x88/0x7a0
       mutex_lock_nested+0x20/0x28
       cleanup_module+0x2ae0/0x3120 [sfp]
       sfp_register_bus+0x5c/0x9c
       sfp_register_socket+0x48/0xd4
       cleanup_module+0x271c/0x3120 [sfp]
       platform_probe+0x64/0xb8
       really_probe+0x17c/0x3c0
       __driver_probe_device+0x78/0x164
       driver_probe_device+0x3c/0xd4
       __driver_attach+0xec/0x1f0
       bus_for_each_dev+0x60/0xa0
       driver_attach+0x20/0x28
       bus_add_driver+0x108/0x208
       driver_register+0x5c/0x118
       __platform_driver_register+0x24/0x2c
       init_module+0x28/0xa7c [sfp]
       do_one_initcall+0x70/0x2ec
       do_init_module+0x54/0x1e4
       load_module+0x1b78/0x1c8c
       __do_sys_init_module+0x1bc/0x2cc
       __arm64_sys_init_module+0x18/0x20
       invoke_syscall.constprop.0+0x4c/0xdc
       do_el0_svc+0x3c/0xbc
       el0_svc+0x34/0x80
       el0t_64_sync_handler+0xf8/0x124
       el0t_64_sync+0x150/0x154

-> #2 (rtnl_mutex){+.+.}-{3:3}:
       __mutex_lock+0x88/0x7a0
       mutex_lock_nested+0x20/0x28
       rtnl_lock+0x18/0x20
       set_device_name+0x30/0x130
       netdev_trig_activate+0x13c/0x1ac
       led_trigger_set+0x118/0x234
       led_trigger_write+0x104/0x17c
       sysfs_kf_bin_write+0x64/0x80
       kernfs_fop_write_iter+0x128/0x1b4
       vfs_write+0x178/0x2a4
       ksys_write+0x58/0xd4
       __arm64_sys_write+0x18/0x20
       invoke_syscall.constprop.0+0x4c/0xdc
       do_el0_svc+0x3c/0xbc
       el0_svc+0x34/0x80
       el0t_64_sync_handler+0xf8/0x124
       el0t_64_sync+0x150/0x154

-> #1 (&led_cdev->trigger_lock){++++}-{3:3}:
       down_write+0x4c/0x13c
       led_trigger_write+0xf8/0x17c
       sysfs_kf_bin_write+0x64/0x80
       kernfs_fop_write_iter+0x128/0x1b4
       vfs_write+0x178/0x2a4
       ksys_write+0x58/0xd4
       __arm64_sys_write+0x18/0x20
       invoke_syscall.constprop.0+0x4c/0xdc
       do_el0_svc+0x3c/0xbc
       el0_svc+0x34/0x80
       el0t_64_sync_handler+0xf8/0x124
       el0t_64_sync+0x150/0x154

-> #0 (triggers_list_lock){++++}-{3:3}:
       __lock_acquire+0x12a0/0x2014
       lock_acquire+0x100/0x2ac
       down_write+0x4c/0x13c
       led_trigger_register+0x4c/0x1a8
       phy_led_triggers_register+0x9c/0x214
       phy_attach_direct+0x154/0x36c
       phylink_attach_phy+0x30/0x60
       phylink_sfp_connect_phy+0x140/0x510
       sfp_add_phy+0x34/0x50
       init_module+0x15c/0xa7c [sfp]
       cleanup_module+0x1d94/0x3120 [sfp]
       cleanup_module+0x2bb4/0x3120 [sfp]
       process_one_work+0x1f8/0x4ec
       worker_thread+0x1e8/0x3d8
       kthread+0x104/0x110
       ret_from_fork+0x10/0x20

other info that might help us debug this:

Chain exists of:
  triggers_list_lock --> rtnl_mutex --> &sfp->sm_mutex

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&sfp->sm_mutex);
                               lock(rtnl_mutex);
                               lock(&sfp->sm_mutex);
  lock(triggers_list_lock);

 *** DEADLOCK ***

4 locks held by kworker/u8:2/43:
 #0: ffffff80c000f938 ((wq_completion)events_power_efficient){+.+.}-{0:0}, at: process_one_work+0x150/0x4ec
 #1: ffffffc08214bde8 ((work_completion)(&(&sfp->timeout)->work)){+.+.}-{0:0}, at: process_one_work+0x150/0x4ec
 #2: ffffffc0810902f8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock+0x18/0x20
 #3: ffffff80c5c6f318 (&sfp->sm_mutex){+.+.}-{3:3}, at: cleanup_module+0x2ba8/0x3120 [sfp]

stack backtrace:
CPU: 0 PID: 43 Comm: kworker/u8:2 Tainted: G           O       6.7.0-rc4-next-20231208+ #0
Hardware name: Bananapi BPI-R4 (DT)
Workqueue: events_power_efficient cleanup_module [sfp]
Call trace:
 dump_backtrace+0xa8/0x10c
 show_stack+0x14/0x1c
 dump_stack_lvl+0x5c/0xa0
 dump_stack+0x14/0x1c
 print_circular_bug+0x328/0x430
 check_noncircular+0x124/0x134
 __lock_acquire+0x12a0/0x2014
 lock_acquire+0x100/0x2ac
 down_write+0x4c/0x13c
 led_trigger_register+0x4c/0x1a8
 phy_led_triggers_register+0x9c/0x214
 phy_attach_direct+0x154/0x36c
 phylink_attach_phy+0x30/0x60
 phylink_sfp_connect_phy+0x140/0x510
 sfp_add_phy+0x34/0x50
 init_module+0x15c/0xa7c [sfp]
 cleanup_module+0x1d94/0x3120 [sfp]
 cleanup_module+0x2bb4/0x3120 [sfp]
 process_one_work+0x1f8/0x4ec
 worker_thread+0x1e8/0x3d8
 kthread+0x104/0x110
 ret_from_fork+0x10/0x20

Signed-off-by: Daniel Golle <[email protected]>
Fixes: 01e5b72 ("net: phy: Add a binding for PHY LEDs")
Link: https://lore.kernel.org/r/102a9dce38bdf00215735d04cd4704458273ad9c.1702339354.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Dec 20, 2023
Jiri Pirko says:

====================
devlink: introduce notifications filtering

From: Jiri Pirko <[email protected]>

Currently the user listening on a socket for devlink notifications
gets always all messages for all existing devlink instances and objects,
even if he is interested only in one of those. That may cause
unnecessary overhead on setups with thousands of instances present.

User is currently able to narrow down the devlink objects replies
to dump commands by specifying select attributes.

Allow similar approach for notifications providing user a new
notify-filter-set command to select attributes with values
the notification message has to match. In that case, it is delivered
to the socket.

Note that the filtering is done per-socket, so multiple users may
specify different selection of attributes with values.

This patchset initially introduces support for following attributes:
DEVLINK_ATTR_BUS_NAME
DEVLINK_ATTR_DEV_NAME
DEVLINK_ATTR_PORT_INDEX

Patches #1 - #4 are preparations in devlink code, patch #3 is
                an optimization done on the way.
Patches #5 - #7 are preparations in netlink and generic netlink code.
Patch #8 is the main one in this set implementing of
         the notify-filter-set command and the actual
         per-socket filtering.
Patch #9 extends the infrastructure allowing to filter according
         to a port index.

Example:
$ devlink mon port pci/0000:08:00.0/32768
[port,new] pci/0000:08:00.0/32768: type notset flavour pcisf controller 0 pfnum 0 sfnum 107 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached roce enable
[port,new] pci/0000:08:00.0/32768: type eth flavour pcisf controller 0 pfnum 0 sfnum 107 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached roce enable
[port,new] pci/0000:08:00.0/32768: type eth netdev eth3 flavour pcisf controller 0 pfnum 0 sfnum 107 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached roce enable
[port,new] pci/0000:08:00.0/32768: type eth netdev eth3 flavour pcisf controller 0 pfnum 0 sfnum 107 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached roce enable
[port,new] pci/0000:08:00.0/32768: type eth flavour pcisf controller 0 pfnum 0 sfnum 107 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached roce enable
[port,new] pci/0000:08:00.0/32768: type notset flavour pcisf controller 0 pfnum 0 sfnum 107 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached roce enable
[port,del] pci/0000:08:00.0/32768: type notset flavour pcisf controller 0 pfnum 0 sfnum 107 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached roce enable
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Jul 17, 2024
Add a set of tests to validate that stack traces captured from or in the
presence of active uprobes and uretprobes are valid and complete.

For this we use BPF program that are installed either on entry or exit
of user function, plus deep-nested USDT. One of target funtions
(target_1) is recursive to generate two different entries in the stack
trace for the same uprobe/uretprobe, testing potential edge conditions.

If there is no fixes, we get something like this for one of the scenarios:

 caller: 0x758fff - 0x7595ab
 target_1: 0x758fd5 - 0x758fff
 target_2: 0x758fca - 0x758fd5
 target_3: 0x758fbf - 0x758fca
 target_4: 0x758fb3 - 0x758fbf
 ENTRY #0: 0x758fb3 (in target_4)
 ENTRY #1: 0x758fd3 (in target_2)
 ENTRY #2: 0x758ffd (in target_1)
 ENTRY #3: 0x7fffffffe000
 ENTRY #4: 0x7fffffffe000
 ENTRY #5: 0x6f8f39
 ENTRY #6: 0x6fa6f0
 ENTRY #7: 0x7f403f229590

Entry #3 and #4 (0x7fffffffe000) are uretprobe trampoline addresses
which obscure actual target_1 and another target_1 invocations. Also
note that between entry #0 and entry #1 we are missing an entry for
target_3.

With fixes, we get desired full stack traces:

 caller: 0x758fff - 0x7595ab
 target_1: 0x758fd5 - 0x758fff
 target_2: 0x758fca - 0x758fd5
 target_3: 0x758fbf - 0x758fca
 target_4: 0x758fb3 - 0x758fbf
 ENTRY #0: 0x758fb7 (in target_4)
 ENTRY #1: 0x758fc8 (in target_3)
 ENTRY #2: 0x758fd3 (in target_2)
 ENTRY #3: 0x758ffd (in target_1)
 ENTRY #4: 0x758ff3 (in target_1)
 ENTRY #5: 0x75922c (in caller)
 ENTRY #6: 0x6f8f39
 ENTRY #7: 0x6fa6f0
 ENTRY #8: 0x7f986adc4cd0

Now there is a logical and complete sequence of function calls.

Link: https://lore.kernel.org/all/[email protected]/

Signed-off-by: Andrii Nakryiko <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Jul 17, 2024
…_child().

syzkaller reported KMSAN splat in tcp_create_openreq_child(). [0]

The uninit variable is tcp_rsk(req)->ao_keyid.

tcp_rsk(req)->ao_keyid is initialised only when tcp_conn_request() finds
a valid TCP AO option in SYN.  Then, tcp_rsk(req)->used_tcp_ao is set
accordingly.

Let's not read tcp_rsk(req)->ao_keyid when tcp_rsk(req)->used_tcp_ao is
false.

[0]:
BUG: KMSAN: uninit-value in tcp_create_openreq_child+0x198b/0x1ff0 net/ipv4/tcp_minisocks.c:610
 tcp_create_openreq_child+0x198b/0x1ff0 net/ipv4/tcp_minisocks.c:610
 tcp_v4_syn_recv_sock+0x18e/0x2170 net/ipv4/tcp_ipv4.c:1754
 tcp_check_req+0x1a3e/0x20c0 net/ipv4/tcp_minisocks.c:852
 tcp_v4_rcv+0x26a4/0x53a0 net/ipv4/tcp_ipv4.c:2265
 ip_protocol_deliver_rcu+0x884/0x1270 net/ipv4/ip_input.c:205
 ip_local_deliver_finish+0x30f/0x530 net/ipv4/ip_input.c:233
 NF_HOOK include/linux/netfilter.h:314 [inline]
 ip_local_deliver+0x230/0x4c0 net/ipv4/ip_input.c:254
 dst_input include/net/dst.h:460 [inline]
 ip_sublist_rcv_finish net/ipv4/ip_input.c:580 [inline]
 ip_list_rcv_finish net/ipv4/ip_input.c:631 [inline]
 ip_sublist_rcv+0x10f7/0x13e0 net/ipv4/ip_input.c:639
 ip_list_rcv+0x952/0x9c0 net/ipv4/ip_input.c:674
 __netif_receive_skb_list_ptype net/core/dev.c:5703 [inline]
 __netif_receive_skb_list_core+0xd92/0x11d0 net/core/dev.c:5751
 __netif_receive_skb_list net/core/dev.c:5803 [inline]
 netif_receive_skb_list_internal+0xd8f/0x1350 net/core/dev.c:5895
 gro_normal_list include/net/gro.h:515 [inline]
 napi_complete_done+0x3f2/0x990 net/core/dev.c:6246
 e1000_clean+0x1fa4/0x5e50 drivers/net/ethernet/intel/e1000/e1000_main.c:3808
 __napi_poll+0xd9/0x990 net/core/dev.c:6771
 napi_poll net/core/dev.c:6840 [inline]
 net_rx_action+0x90f/0x17e0 net/core/dev.c:6962
 handle_softirqs+0x152/0x6b0 kernel/softirq.c:554
 __do_softirq kernel/softirq.c:588 [inline]
 invoke_softirq kernel/softirq.c:428 [inline]
 __irq_exit_rcu kernel/softirq.c:637 [inline]
 irq_exit_rcu+0x5d/0x120 kernel/softirq.c:649
 common_interrupt+0x83/0x90 arch/x86/kernel/irq.c:278
 asm_common_interrupt+0x26/0x40 arch/x86/include/asm/idtentry.h:693
 __msan_instrument_asm_store+0xd6/0xe0
 arch_atomic_inc arch/x86/include/asm/atomic.h:53 [inline]
 raw_atomic_inc include/linux/atomic/atomic-arch-fallback.h:992 [inline]
 atomic_inc include/linux/atomic/atomic-instrumented.h:436 [inline]
 page_ref_inc include/linux/page_ref.h:153 [inline]
 folio_ref_inc include/linux/page_ref.h:160 [inline]
 filemap_map_order0_folio mm/filemap.c:3596 [inline]
 filemap_map_pages+0x11c7/0x2270 mm/filemap.c:3644
 do_fault_around mm/memory.c:4879 [inline]
 do_read_fault mm/memory.c:4912 [inline]
 do_fault mm/memory.c:5051 [inline]
 do_pte_missing mm/memory.c:3897 [inline]
 handle_pte_fault mm/memory.c:5381 [inline]
 __handle_mm_fault mm/memory.c:5524 [inline]
 handle_mm_fault+0x3677/0x6f00 mm/memory.c:5689
 do_user_addr_fault+0x1373/0x2b20 arch/x86/mm/fault.c:1338
 handle_page_fault arch/x86/mm/fault.c:1481 [inline]
 exc_page_fault+0x54/0xc0 arch/x86/mm/fault.c:1539
 asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:623

Uninit was stored to memory at:
 tcp_create_openreq_child+0x1984/0x1ff0 net/ipv4/tcp_minisocks.c:611
 tcp_v4_syn_recv_sock+0x18e/0x2170 net/ipv4/tcp_ipv4.c:1754
 tcp_check_req+0x1a3e/0x20c0 net/ipv4/tcp_minisocks.c:852
 tcp_v4_rcv+0x26a4/0x53a0 net/ipv4/tcp_ipv4.c:2265
 ip_protocol_deliver_rcu+0x884/0x1270 net/ipv4/ip_input.c:205
 ip_local_deliver_finish+0x30f/0x530 net/ipv4/ip_input.c:233
 NF_HOOK include/linux/netfilter.h:314 [inline]
 ip_local_deliver+0x230/0x4c0 net/ipv4/ip_input.c:254
 dst_input include/net/dst.h:460 [inline]
 ip_sublist_rcv_finish net/ipv4/ip_input.c:580 [inline]
 ip_list_rcv_finish net/ipv4/ip_input.c:631 [inline]
 ip_sublist_rcv+0x10f7/0x13e0 net/ipv4/ip_input.c:639
 ip_list_rcv+0x952/0x9c0 net/ipv4/ip_input.c:674
 __netif_receive_skb_list_ptype net/core/dev.c:5703 [inline]
 __netif_receive_skb_list_core+0xd92/0x11d0 net/core/dev.c:5751
 __netif_receive_skb_list net/core/dev.c:5803 [inline]
 netif_receive_skb_list_internal+0xd8f/0x1350 net/core/dev.c:5895
 gro_normal_list include/net/gro.h:515 [inline]
 napi_complete_done+0x3f2/0x990 net/core/dev.c:6246
 e1000_clean+0x1fa4/0x5e50 drivers/net/ethernet/intel/e1000/e1000_main.c:3808
 __napi_poll+0xd9/0x990 net/core/dev.c:6771
 napi_poll net/core/dev.c:6840 [inline]
 net_rx_action+0x90f/0x17e0 net/core/dev.c:6962
 handle_softirqs+0x152/0x6b0 kernel/softirq.c:554
 __do_softirq kernel/softirq.c:588 [inline]
 invoke_softirq kernel/softirq.c:428 [inline]
 __irq_exit_rcu kernel/softirq.c:637 [inline]
 irq_exit_rcu+0x5d/0x120 kernel/softirq.c:649
 common_interrupt+0x83/0x90 arch/x86/kernel/irq.c:278
 asm_common_interrupt+0x26/0x40 arch/x86/include/asm/idtentry.h:693

Uninit was created at:
 __alloc_pages_noprof+0x82d/0xcb0 mm/page_alloc.c:4706
 __alloc_pages_node_noprof include/linux/gfp.h:269 [inline]
 alloc_pages_node_noprof include/linux/gfp.h:296 [inline]
 alloc_slab_page mm/slub.c:2265 [inline]
 allocate_slab mm/slub.c:2428 [inline]
 new_slab+0x2af/0x14e0 mm/slub.c:2481
 ___slab_alloc+0xf73/0x3150 mm/slub.c:3667
 __slab_alloc mm/slub.c:3757 [inline]
 __slab_alloc_node mm/slub.c:3810 [inline]
 slab_alloc_node mm/slub.c:3990 [inline]
 kmem_cache_alloc_noprof+0x53a/0x9f0 mm/slub.c:4009
 reqsk_alloc_noprof net/ipv4/inet_connection_sock.c:920 [inline]
 inet_reqsk_alloc+0x63/0x700 net/ipv4/inet_connection_sock.c:951
 tcp_conn_request+0x339/0x4860 net/ipv4/tcp_input.c:7177
 tcp_v4_conn_request+0x13b/0x190 net/ipv4/tcp_ipv4.c:1719
 tcp_rcv_state_process+0x2dd/0x4a10 net/ipv4/tcp_input.c:6711
 tcp_v4_do_rcv+0xbee/0x10d0 net/ipv4/tcp_ipv4.c:1932
 tcp_v4_rcv+0x3fad/0x53a0 net/ipv4/tcp_ipv4.c:2334
 ip_protocol_deliver_rcu+0x884/0x1270 net/ipv4/ip_input.c:205
 ip_local_deliver_finish+0x30f/0x530 net/ipv4/ip_input.c:233
 NF_HOOK include/linux/netfilter.h:314 [inline]
 ip_local_deliver+0x230/0x4c0 net/ipv4/ip_input.c:254
 dst_input include/net/dst.h:460 [inline]
 ip_sublist_rcv_finish net/ipv4/ip_input.c:580 [inline]
 ip_list_rcv_finish net/ipv4/ip_input.c:631 [inline]
 ip_sublist_rcv+0x10f7/0x13e0 net/ipv4/ip_input.c:639
 ip_list_rcv+0x952/0x9c0 net/ipv4/ip_input.c:674
 __netif_receive_skb_list_ptype net/core/dev.c:5703 [inline]
 __netif_receive_skb_list_core+0xd92/0x11d0 net/core/dev.c:5751
 __netif_receive_skb_list net/core/dev.c:5803 [inline]
 netif_receive_skb_list_internal+0xd8f/0x1350 net/core/dev.c:5895
 gro_normal_list include/net/gro.h:515 [inline]
 napi_complete_done+0x3f2/0x990 net/core/dev.c:6246
 e1000_clean+0x1fa4/0x5e50 drivers/net/ethernet/intel/e1000/e1000_main.c:3808
 __napi_poll+0xd9/0x990 net/core/dev.c:6771
 napi_poll net/core/dev.c:6840 [inline]
 net_rx_action+0x90f/0x17e0 net/core/dev.c:6962
 handle_softirqs+0x152/0x6b0 kernel/softirq.c:554
 __do_softirq kernel/softirq.c:588 [inline]
 invoke_softirq kernel/softirq.c:428 [inline]
 __irq_exit_rcu kernel/softirq.c:637 [inline]
 irq_exit_rcu+0x5d/0x120 kernel/softirq.c:649
 common_interrupt+0x83/0x90 arch/x86/kernel/irq.c:278
 asm_common_interrupt+0x26/0x40 arch/x86/include/asm/idtentry.h:693

CPU: 0 PID: 239 Comm: modprobe Tainted: G    B              6.10.0-rc7-01816-g852e42cc2dd4 #3 1107521f0c7b55c9309062382d0bda9f604dbb6d
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014

Fixes: 06b22ef ("net/tcp: Wire TCP-AO to request sockets")
Reported-by: syzkaller <[email protected]>
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Acked-by: Dmitry Safonov <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Jul 17, 2024
When tries to demote 1G hugetlb folios, a lockdep warning is observed:

============================================
WARNING: possible recursive locking detected
6.10.0-rc6-00452-ga4d0275fa660-dirty #79 Not tainted
--------------------------------------------
bash/710 is trying to acquire lock:
ffffffff8f0a7850 (&h->resize_lock){+.+.}-{3:3}, at: demote_store+0x244/0x460

but task is already holding lock:
ffffffff8f0a6f48 (&h->resize_lock){+.+.}-{3:3}, at: demote_store+0xae/0x460

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&h->resize_lock);
  lock(&h->resize_lock);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

4 locks held by bash/710:
 #0: ffff8f118439c3f0 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x64/0xe0
 #1: ffff8f11893b9e88 (&of->mutex#2){+.+.}-{3:3}, at: kernfs_fop_write_iter+0xf8/0x1d0
 #2: ffff8f1183dc4428 (kn->active#98){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x100/0x1d0
 #3: ffffffff8f0a6f48 (&h->resize_lock){+.+.}-{3:3}, at: demote_store+0xae/0x460

stack backtrace:
CPU: 3 PID: 710 Comm: bash Not tainted 6.10.0-rc6-00452-ga4d0275fa660-dirty #79
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x68/0xa0
 __lock_acquire+0x10f2/0x1ca0
 lock_acquire+0xbe/0x2d0
 __mutex_lock+0x6d/0x400
 demote_store+0x244/0x460
 kernfs_fop_write_iter+0x12c/0x1d0
 vfs_write+0x380/0x540
 ksys_write+0x64/0xe0
 do_syscall_64+0xb9/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fa61db14887
RSP: 002b:00007ffc56c48358 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fa61db14887
RDX: 0000000000000002 RSI: 000055a030050220 RDI: 0000000000000001
RBP: 000055a030050220 R08: 00007fa61dbd1460 R09: 000000007fffffff
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
R13: 00007fa61dc1b780 R14: 00007fa61dc17600 R15: 00007fa61dc16a00
 </TASK>

Lockdep considers this an AA deadlock because the different resize_lock
mutexes reside in the same lockdep class, but this is a false positive.
Place them in distinct classes to avoid these warnings.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 8531fc6 ("hugetlb: add hugetlb demote page support")
Signed-off-by: Miaohe Lin <[email protected]>
Acked-by: Muchun Song <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Jul 18, 2024
When tries to demote 1G hugetlb folios, a lockdep warning is observed:

============================================
WARNING: possible recursive locking detected
6.10.0-rc6-00452-ga4d0275fa660-dirty #79 Not tainted
--------------------------------------------
bash/710 is trying to acquire lock:
ffffffff8f0a7850 (&h->resize_lock){+.+.}-{3:3}, at: demote_store+0x244/0x460

but task is already holding lock:
ffffffff8f0a6f48 (&h->resize_lock){+.+.}-{3:3}, at: demote_store+0xae/0x460

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&h->resize_lock);
  lock(&h->resize_lock);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

4 locks held by bash/710:
 #0: ffff8f118439c3f0 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x64/0xe0
 #1: ffff8f11893b9e88 (&of->mutex#2){+.+.}-{3:3}, at: kernfs_fop_write_iter+0xf8/0x1d0
 #2: ffff8f1183dc4428 (kn->active#98){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x100/0x1d0
 #3: ffffffff8f0a6f48 (&h->resize_lock){+.+.}-{3:3}, at: demote_store+0xae/0x460

stack backtrace:
CPU: 3 PID: 710 Comm: bash Not tainted 6.10.0-rc6-00452-ga4d0275fa660-dirty #79
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x68/0xa0
 __lock_acquire+0x10f2/0x1ca0
 lock_acquire+0xbe/0x2d0
 __mutex_lock+0x6d/0x400
 demote_store+0x244/0x460
 kernfs_fop_write_iter+0x12c/0x1d0
 vfs_write+0x380/0x540
 ksys_write+0x64/0xe0
 do_syscall_64+0xb9/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fa61db14887
RSP: 002b:00007ffc56c48358 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fa61db14887
RDX: 0000000000000002 RSI: 000055a030050220 RDI: 0000000000000001
RBP: 000055a030050220 R08: 00007fa61dbd1460 R09: 000000007fffffff
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
R13: 00007fa61dc1b780 R14: 00007fa61dc17600 R15: 00007fa61dc16a00
 </TASK>

Lockdep considers this an AA deadlock because the different resize_lock
mutexes reside in the same lockdep class, but this is a false positive.
Place them in distinct classes to avoid these warnings.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 8531fc6 ("hugetlb: add hugetlb demote page support")
Signed-off-by: Miaohe Lin <[email protected]>
Acked-by: Muchun Song <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Jul 21, 2024
When tries to demote 1G hugetlb folios, a lockdep warning is observed:

============================================
WARNING: possible recursive locking detected
6.10.0-rc6-00452-ga4d0275fa660-dirty #79 Not tainted
--------------------------------------------
bash/710 is trying to acquire lock:
ffffffff8f0a7850 (&h->resize_lock){+.+.}-{3:3}, at: demote_store+0x244/0x460

but task is already holding lock:
ffffffff8f0a6f48 (&h->resize_lock){+.+.}-{3:3}, at: demote_store+0xae/0x460

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&h->resize_lock);
  lock(&h->resize_lock);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

4 locks held by bash/710:
 #0: ffff8f118439c3f0 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x64/0xe0
 #1: ffff8f11893b9e88 (&of->mutex#2){+.+.}-{3:3}, at: kernfs_fop_write_iter+0xf8/0x1d0
 #2: ffff8f1183dc4428 (kn->active#98){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x100/0x1d0
 #3: ffffffff8f0a6f48 (&h->resize_lock){+.+.}-{3:3}, at: demote_store+0xae/0x460

stack backtrace:
CPU: 3 PID: 710 Comm: bash Not tainted 6.10.0-rc6-00452-ga4d0275fa660-dirty #79
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x68/0xa0
 __lock_acquire+0x10f2/0x1ca0
 lock_acquire+0xbe/0x2d0
 __mutex_lock+0x6d/0x400
 demote_store+0x244/0x460
 kernfs_fop_write_iter+0x12c/0x1d0
 vfs_write+0x380/0x540
 ksys_write+0x64/0xe0
 do_syscall_64+0xb9/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fa61db14887
RSP: 002b:00007ffc56c48358 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fa61db14887
RDX: 0000000000000002 RSI: 000055a030050220 RDI: 0000000000000001
RBP: 000055a030050220 R08: 00007fa61dbd1460 R09: 000000007fffffff
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
R13: 00007fa61dc1b780 R14: 00007fa61dc17600 R15: 00007fa61dc16a00
 </TASK>

Lockdep considers this an AA deadlock because the different resize_lock
mutexes reside in the same lockdep class, but this is a false positive.
Place them in distinct classes to avoid these warnings.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 8531fc6 ("hugetlb: add hugetlb demote page support")
Signed-off-by: Miaohe Lin <[email protected]>
Acked-by: Muchun Song <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Jul 30, 2024
When using cachefiles, lockdep may emit something similar to the circular
locking dependency notice below.  The problem appears to stem from the
following:

 (1) Cachefiles manipulates xattrs on the files in its cache when called
     from ->writepages().

 (2) The setxattr() and removexattr() system call handlers get the name
     (and value) from userspace after taking the sb_writers lock, putting
     accesses of the vma->vm_lock and mm->mmap_lock inside of that.

 (3) The afs filesystem uses a per-inode lock to prevent multiple
     revalidation RPCs and in writeback vs truncate to prevent parallel
     operations from deadlocking against the server on one side and local
     page locks on the other.

Fix this by moving the getting of the name and value in {get,remove}xattr()
outside of the sb_writers lock.  This also has the minor benefits that we
don't need to reget these in the event of a retry and we never try to take
the sb_writers lock in the event we can't pull the name and value into the
kernel.

Alternative approaches that might fix this include moving the dispatch of a
write to the cache off to a workqueue or trying to do without the
validation lock in afs.  Note that this might also affect other filesystems
that use netfslib and/or cachefiles.

 ======================================================
 WARNING: possible circular locking dependency detected
 6.10.0-build2+ #956 Not tainted
 ------------------------------------------------------
 fsstress/6050 is trying to acquire lock:
 ffff888138fd82f0 (mapping.invalidate_lock#3){++++}-{3:3}, at: filemap_fault+0x26e/0x8b0

 but task is already holding lock:
 ffff888113f26d18 (&vma->vm_lock->lock){++++}-{3:3}, at: lock_vma_under_rcu+0x165/0x250

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #4 (&vma->vm_lock->lock){++++}-{3:3}:
        __lock_acquire+0xaf0/0xd80
        lock_acquire.part.0+0x103/0x280
        down_write+0x3b/0x50
        vma_start_write+0x6b/0xa0
        vma_link+0xcc/0x140
        insert_vm_struct+0xb7/0xf0
        alloc_bprm+0x2c1/0x390
        kernel_execve+0x65/0x1a0
        call_usermodehelper_exec_async+0x14d/0x190
        ret_from_fork+0x24/0x40
        ret_from_fork_asm+0x1a/0x30

 -> #3 (&mm->mmap_lock){++++}-{3:3}:
        __lock_acquire+0xaf0/0xd80
        lock_acquire.part.0+0x103/0x280
        __might_fault+0x7c/0xb0
        strncpy_from_user+0x25/0x160
        removexattr+0x7f/0x100
        __do_sys_fremovexattr+0x7e/0xb0
        do_syscall_64+0x9f/0x100
        entry_SYSCALL_64_after_hwframe+0x76/0x7e

 -> #2 (sb_writers#14){.+.+}-{0:0}:
        __lock_acquire+0xaf0/0xd80
        lock_acquire.part.0+0x103/0x280
        percpu_down_read+0x3c/0x90
        vfs_iocb_iter_write+0xe9/0x1d0
        __cachefiles_write+0x367/0x430
        cachefiles_issue_write+0x299/0x2f0
        netfs_advance_write+0x117/0x140
        netfs_write_folio.isra.0+0x5ca/0x6e0
        netfs_writepages+0x230/0x2f0
        afs_writepages+0x4d/0x70
        do_writepages+0x1e8/0x3e0
        filemap_fdatawrite_wbc+0x84/0xa0
        __filemap_fdatawrite_range+0xa8/0xf0
        file_write_and_wait_range+0x59/0x90
        afs_release+0x10f/0x270
        __fput+0x25f/0x3d0
        __do_sys_close+0x43/0x70
        do_syscall_64+0x9f/0x100
        entry_SYSCALL_64_after_hwframe+0x76/0x7e

 -> #1 (&vnode->validate_lock){++++}-{3:3}:
        __lock_acquire+0xaf0/0xd80
        lock_acquire.part.0+0x103/0x280
        down_read+0x95/0x200
        afs_writepages+0x37/0x70
        do_writepages+0x1e8/0x3e0
        filemap_fdatawrite_wbc+0x84/0xa0
        filemap_invalidate_inode+0x167/0x1e0
        netfs_unbuffered_write_iter+0x1bd/0x2d0
        vfs_write+0x22e/0x320
        ksys_write+0xbc/0x130
        do_syscall_64+0x9f/0x100
        entry_SYSCALL_64_after_hwframe+0x76/0x7e

 -> #0 (mapping.invalidate_lock#3){++++}-{3:3}:
        check_noncircular+0x119/0x160
        check_prev_add+0x195/0x430
        __lock_acquire+0xaf0/0xd80
        lock_acquire.part.0+0x103/0x280
        down_read+0x95/0x200
        filemap_fault+0x26e/0x8b0
        __do_fault+0x57/0xd0
        do_pte_missing+0x23b/0x320
        __handle_mm_fault+0x2d4/0x320
        handle_mm_fault+0x14f/0x260
        do_user_addr_fault+0x2a2/0x500
        exc_page_fault+0x71/0x90
        asm_exc_page_fault+0x22/0x30

 other info that might help us debug this:

 Chain exists of:
   mapping.invalidate_lock#3 --> &mm->mmap_lock --> &vma->vm_lock->lock

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   rlock(&vma->vm_lock->lock);
                                lock(&mm->mmap_lock);
                                lock(&vma->vm_lock->lock);
   rlock(mapping.invalidate_lock#3);

  *** DEADLOCK ***

 1 lock held by fsstress/6050:
  #0: ffff888113f26d18 (&vma->vm_lock->lock){++++}-{3:3}, at: lock_vma_under_rcu+0x165/0x250

 stack backtrace:
 CPU: 0 PID: 6050 Comm: fsstress Not tainted 6.10.0-build2+ #956
 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
 Call Trace:
  <TASK>
  dump_stack_lvl+0x57/0x80
  check_noncircular+0x119/0x160
  ? queued_spin_lock_slowpath+0x4be/0x510
  ? __pfx_check_noncircular+0x10/0x10
  ? __pfx_queued_spin_lock_slowpath+0x10/0x10
  ? mark_lock+0x47/0x160
  ? init_chain_block+0x9c/0xc0
  ? add_chain_block+0x84/0xf0
  check_prev_add+0x195/0x430
  __lock_acquire+0xaf0/0xd80
  ? __pfx___lock_acquire+0x10/0x10
  ? __lock_release.isra.0+0x13b/0x230
  lock_acquire.part.0+0x103/0x280
  ? filemap_fault+0x26e/0x8b0
  ? __pfx_lock_acquire.part.0+0x10/0x10
  ? rcu_is_watching+0x34/0x60
  ? lock_acquire+0xd7/0x120
  down_read+0x95/0x200
  ? filemap_fault+0x26e/0x8b0
  ? __pfx_down_read+0x10/0x10
  ? __filemap_get_folio+0x25/0x1a0
  filemap_fault+0x26e/0x8b0
  ? __pfx_filemap_fault+0x10/0x10
  ? find_held_lock+0x7c/0x90
  ? __pfx___lock_release.isra.0+0x10/0x10
  ? __pte_offset_map+0x99/0x110
  __do_fault+0x57/0xd0
  do_pte_missing+0x23b/0x320
  __handle_mm_fault+0x2d4/0x320
  ? __pfx___handle_mm_fault+0x10/0x10
  handle_mm_fault+0x14f/0x260
  do_user_addr_fault+0x2a2/0x500
  exc_page_fault+0x71/0x90
  asm_exc_page_fault+0x22/0x30

Signed-off-by: David Howells <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
cc: Alexander Viro <[email protected]>
cc: Christian Brauner <[email protected]>
cc: Jan Kara <[email protected]>
cc: Jeff Layton <[email protected]>
cc: Gao Xiang <[email protected]>
cc: Matthew Wilcox <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
[brauner: fix minor issues]
Signed-off-by: Christian Brauner <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Jul 30, 2024
…-level'

Eduard Zingerman says:

====================
bpf: track find_equal_scalars history on per-instruction level

This is a fix for precision tracking bug reported in [0].
It supersedes my previous attempt to fix similar issue in commit [1].
Here is a minimized test case from [0]:

    0:  call bpf_get_prandom_u32;
    1:  r7 = r0;
    2:  r8 = r0;
    3:  call bpf_get_prandom_u32;
    4:  if r0 > 1 goto +0;
    /* --- checkpoint #1: r7.id=1, r8.id=1 --- */
    5:  if r8 >= r0 goto 9f;
    6:  r8 += r8;
    /* --- checkpoint #2: r7.id=1, r8.id=0 --- */
    7:  if r7 == 0 goto 9f;
    8:  r0 /= 0;
    /* --- checkpoint #3 --- */
    9:  r0 = 42;
    10: exit;

W/o this fix verifier incorrectly assumes that instruction at label
(8) is unreachable. The issue is caused by failure to infer
precision mark for r0 at checkpoint #1:
- first verification path is:
  - (0-4): r0 range [0,1];
  - (5): r8 range [0,0], propagated to r7;
  - (6): r8.id is reset;
  - (7): jump is predicted to happen;
  - (9-10): safe exit.
- when jump at (7) is predicted mark_chain_precision() for r7 is
  called and backtrack_insn() proceeds as follows:
  - at (7) r7 is marked as precise;
  - at (5) r8 is not currently tracked and thus r0 is not marked;
  - at (4-5) boundary logic from [1] is triggered and r7,r8 are marked
    as precise;
  - => r0 precision mark is missed.
- when second branch of (4) is considered, verifier prunes the state
  because r0 is not marked as precise in the visited state.

Basically, backtracking logic fails to notice that at (5)
range information is gained for both r7 and r8, and thus both
r8 and r0 have to be marked as precise.
This happens because [1] can only account for such range
transfers at parent/child state boundaries.

The solution suggested by Andrii Nakryiko in [0] is to use jump
history to remember which registers gained range as a result of
find_equal_scalars() [renamed to sync_linked_regs()] and use
this information in backtrack_insn().
Which is what this patch-set does.

The patch-set uses u64 value as a vector of 10-bit values that
identify registers gaining range in find_equal_scalars().
This amounts to maximum of 6 possible values.
To check if such capacity is sufficient I've instrumented kernel
to track a histogram for maximal amount of registers that gain range
in find_equal_scalars per program verification [2].
Measurements done for verifier selftests and Cilium bpf object files
from [3] show that number of such registers is *always* <= 4 and
in 98% of cases it is <= 2.

When tested on a subset of selftests identified by
selftests/bpf/veristat.cfg and Cilium bpf object files from [3]
this patch-set has minimal verification performance impact:

File                      Program                   Insns   (DIFF)  States (DIFF)
------------------------  ------------------------  --------------  -------------
bpf_host.o                tail_handle_nat_fwd_ipv4    -75 (-0.61%)    -3 (-0.39%)
pyperf600_nounroll.bpf.o  on_event                  +1673 (+0.33%)    +3 (+0.01%)

[0] https://lore.kernel.org/bpf/CAEf4BzZ0xidVCqB47XnkXcNhkPWF6_nTV7yt+_Lf0kcFEut2Mg@mail.gmail.com/
[1] commit 904e6dd ("bpf: Use scalar ids in mark_chain_precision()")
[2] https:/eddyz87/bpf/tree/find-equal-scalars-in-jump-history-with-stats
[3] https:/anakryiko/cilium

Changes:
- v2 -> v3:
  A number of stylistic changes suggested by Andrii:
  - renamings:
    - struct reg_or_spill   -> linked_reg;
    - find_equal_scalars()  -> collect_linked_regs;
    - copy_known_reg()      -> sync_linked_regs;
  - collect_linked_regs() now returns linked regs set of
    size 2 or larger;
  - dropped usage of bit fields in struct linked_reg;
  - added a patch changing references to find_equal_scalars() in
    selftests comments.
- v1 -> v2:
  - patch "bpf: replace env->cur_hist_ent with a getter function" is
    dropped (Andrii);
  - added structure linked_regs and helper functions to [de]serialize
    u64 value as such structure (Andrii);
  - bt_set_equal_scalars() renamed to bt_sync_linked_regs(), moved to
    start and end of backtrack_insn() in order to untie linked
    register logic from conditional jumps backtracking.
    Andrii requested a more radical change of moving linked registers
    processing to bt_set_xxx() functions, I did an experiment in this
    direction:
    https:/eddyz87/bpf/tree/find-equal-scalars-in-jump-history--linked-regs-in-bt-set-reg
    the end result of the experiment seems much uglier than version
    presented in v2.

Revisions:
- v1: https://lore.kernel.org/bpf/[email protected]/
- v2: https://lore.kernel.org/bpf/[email protected]/
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Andrii Nakryiko <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Aug 9, 2024
…ptions

Patch series "mm: split PTE/PMD PT table Kconfig cleanups+clarifications".

This series is a follow up to the fixes:
	"[PATCH v1 0/2] mm/hugetlb: fix hugetlb vs. core-mm PT locking"

When working on the fixes, I wondered why 8xx is fine (-> never uses split
PT locks) and how PT locking even works properly with PMD page table
sharing (-> always requires split PMD PT locks).

Let's improve the split PT lock detection, make hugetlb properly depend on
it and make 8xx bail out if it would ever get enabled by accident.

As an alternative to patch #3 we could extend the Kconfig
SPLIT_PTE_PTLOCKS option from patch #2 -- but enforcing it closer to the
code that actually implements it feels a bit nicer for documentation
purposes, and there is no need to actually disable it because it should
always be disabled (!SMP).

Did a bunch of cross-compilations to make sure that split PTE/PMD PT locks
are still getting used where we would expect them.

[1] https://lkml.kernel.org/r/[email protected]


This patch (of 3):

Let's clean that up a bit and prepare for depending on
CONFIG_SPLIT_PMD_PTLOCKS in other Kconfig options.

More cleanups would be reasonable (like the arch-specific "depends on" for
CONFIG_SPLIT_PTE_PTLOCKS), but we'll leave that for another day.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Acked-by: Mike Rapoport (Microsoft) <[email protected]>
Reviewed-by: Russell King (Oracle) <[email protected]>
Reviewed-by: Qi Zheng <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: "Naveen N. Rao" <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Aug 9, 2024
iter_finish_branch_entry() doesn't put the branch_info from/to map
elements creating memory leaks. This can be seen with:

```
$ perf record -e cycles -b perf test -w noploop
$ perf report -D
...
Direct leak of 984344 byte(s) in 123043 object(s) allocated from:
    #0 0x7fb2654f3bd7 in malloc libsanitizer/asan/asan_malloc_linux.cpp:69
    #1 0x564d3400d10b in map__get util/map.h:186
    #2 0x564d3400d10b in ip__resolve_ams util/machine.c:1981
    #3 0x564d34014d81 in sample__resolve_bstack util/machine.c:2151
    #4 0x564d34094790 in iter_prepare_branch_entry util/hist.c:898
    #5 0x564d34098fa4 in hist_entry_iter__add util/hist.c:1238
    #6 0x564d33d1f0c7 in process_sample_event tools/perf/builtin-report.c:334
    #7 0x564d34031eb7 in perf_session__deliver_event util/session.c:1655
    #8 0x564d3403ba52 in do_flush util/ordered-events.c:245
    #9 0x564d3403ba52 in __ordered_events__flush util/ordered-events.c:324
    #10 0x564d3402d32e in perf_session__process_user_event util/session.c:1708
    #11 0x564d34032480 in perf_session__process_event util/session.c:1877
    #12 0x564d340336ad in reader__read_event util/session.c:2399
    #13 0x564d34033fdc in reader__process_events util/session.c:2448
    #14 0x564d34033fdc in __perf_session__process_events util/session.c:2495
    #15 0x564d34033fdc in perf_session__process_events util/session.c:2661
    #16 0x564d33d27113 in __cmd_report tools/perf/builtin-report.c:1065
    #17 0x564d33d27113 in cmd_report tools/perf/builtin-report.c:1805
    #18 0x564d33e0ccb7 in run_builtin tools/perf/perf.c:350
    #19 0x564d33e0d45e in handle_internal_command tools/perf/perf.c:403
    #20 0x564d33cdd827 in run_argv tools/perf/perf.c:447
    #21 0x564d33cdd827 in main tools/perf/perf.c:561
...
```

Clearing up the map_symbols properly creates maps reference count
issues so resolve those. Resolving this issue doesn't improve peak
heap consumption for the test above.

Committer testing:

  $ sudo dnf install libasan
  $ make -k CORESIGHT=1 EXTRA_CFLAGS="-fsanitize=address" CC=clang O=/tmp/build/$(basename $PWD)/ -C tools/perf install-bin

Reviewed-by: Kan Liang <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sun Haiyong <[email protected]>
Cc: Yanteng Si <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Aug 9, 2024
…ptions

Patch series "mm: split PTE/PMD PT table Kconfig cleanups+clarifications".

This series is a follow up to the fixes:
	"[PATCH v1 0/2] mm/hugetlb: fix hugetlb vs. core-mm PT locking"

When working on the fixes, I wondered why 8xx is fine (-> never uses split
PT locks) and how PT locking even works properly with PMD page table
sharing (-> always requires split PMD PT locks).

Let's improve the split PT lock detection, make hugetlb properly depend on
it and make 8xx bail out if it would ever get enabled by accident.

As an alternative to patch #3 we could extend the Kconfig
SPLIT_PTE_PTLOCKS option from patch #2 -- but enforcing it closer to the
code that actually implements it feels a bit nicer for documentation
purposes, and there is no need to actually disable it because it should
always be disabled (!SMP).

Did a bunch of cross-compilations to make sure that split PTE/PMD PT locks
are still getting used where we would expect them.

[1] https://lkml.kernel.org/r/[email protected]


This patch (of 3):

Let's clean that up a bit and prepare for depending on
CONFIG_SPLIT_PMD_PTLOCKS in other Kconfig options.

More cleanups would be reasonable (like the arch-specific "depends on" for
CONFIG_SPLIT_PTE_PTLOCKS), but we'll leave that for another day.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Acked-by: Mike Rapoport (Microsoft) <[email protected]>
Reviewed-by: Russell King (Oracle) <[email protected]>
Reviewed-by: Qi Zheng <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: "Naveen N. Rao" <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Aug 9, 2024
When l2tp tunnels use a socket provided by userspace, we can hit
lockdep splats like the below when data is transmitted through another
(unrelated) userspace socket which then gets routed over l2tp.

This issue was previously discussed here:
https://lore.kernel.org/netdev/[email protected]/

The solution is to have lockdep treat socket locks of l2tp tunnel
sockets separately than those of standard INET sockets. To do so, use
a different lockdep subclass where lock nesting is possible.

  ============================================
  WARNING: possible recursive locking detected
  6.10.0+ #34 Not tainted
  --------------------------------------------
  iperf3/771 is trying to acquire lock:
  ffff8881027601d8 (slock-AF_INET/1){+.-.}-{2:2}, at: l2tp_xmit_skb+0x243/0x9d0

  but task is already holding lock:
  ffff888102650d98 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1848/0x1e10

  other info that might help us debug this:
   Possible unsafe locking scenario:

         CPU0
         ----
    lock(slock-AF_INET/1);
    lock(slock-AF_INET/1);

   *** DEADLOCK ***

   May be due to missing lock nesting notation

  10 locks held by iperf3/771:
   #0: ffff888102650258 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_sendmsg+0x1a/0x40
   #1: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x4b/0xbc0
   #2: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x17a/0x1130
   #3: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: process_backlog+0x28b/0x9f0
   #4: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0xf9/0x260
   #5: ffff888102650d98 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1848/0x1e10
   #6: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x4b/0xbc0
   #7: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x17a/0x1130
   #8: ffffffff822ac1e0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0xcc/0x1450
   #9: ffff888101f33258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock#2){+...}-{2:2}, at: __dev_queue_xmit+0x513/0x1450

  stack backtrace:
  CPU: 2 UID: 0 PID: 771 Comm: iperf3 Not tainted 6.10.0+ #34
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
  Call Trace:
   <IRQ>
   dump_stack_lvl+0x69/0xa0
   dump_stack+0xc/0x20
   __lock_acquire+0x135d/0x2600
   ? srso_alias_return_thunk+0x5/0xfbef5
   lock_acquire+0xc4/0x2a0
   ? l2tp_xmit_skb+0x243/0x9d0
   ? __skb_checksum+0xa3/0x540
   _raw_spin_lock_nested+0x35/0x50
   ? l2tp_xmit_skb+0x243/0x9d0
   l2tp_xmit_skb+0x243/0x9d0
   l2tp_eth_dev_xmit+0x3c/0xc0
   dev_hard_start_xmit+0x11e/0x420
   sch_direct_xmit+0xc3/0x640
   __dev_queue_xmit+0x61c/0x1450
   ? ip_finish_output2+0xf4c/0x1130
   ip_finish_output2+0x6b6/0x1130
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? __ip_finish_output+0x217/0x380
   ? srso_alias_return_thunk+0x5/0xfbef5
   __ip_finish_output+0x217/0x380
   ip_output+0x99/0x120
   __ip_queue_xmit+0xae4/0xbc0
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? tcp_options_write.constprop.0+0xcb/0x3e0
   ip_queue_xmit+0x34/0x40
   __tcp_transmit_skb+0x1625/0x1890
   __tcp_send_ack+0x1b8/0x340
   tcp_send_ack+0x23/0x30
   __tcp_ack_snd_check+0xa8/0x530
   ? srso_alias_return_thunk+0x5/0xfbef5
   tcp_rcv_established+0x412/0xd70
   tcp_v4_do_rcv+0x299/0x420
   tcp_v4_rcv+0x1991/0x1e10
   ip_protocol_deliver_rcu+0x50/0x220
   ip_local_deliver_finish+0x158/0x260
   ip_local_deliver+0xc8/0xe0
   ip_rcv+0xe5/0x1d0
   ? __pfx_ip_rcv+0x10/0x10
   __netif_receive_skb_one_core+0xce/0xe0
   ? process_backlog+0x28b/0x9f0
   __netif_receive_skb+0x34/0xd0
   ? process_backlog+0x28b/0x9f0
   process_backlog+0x2cb/0x9f0
   __napi_poll.constprop.0+0x61/0x280
   net_rx_action+0x332/0x670
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? find_held_lock+0x2b/0x80
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? srso_alias_return_thunk+0x5/0xfbef5
   handle_softirqs+0xda/0x480
   ? __dev_queue_xmit+0xa2c/0x1450
   do_softirq+0xa1/0xd0
   </IRQ>
   <TASK>
   __local_bh_enable_ip+0xc8/0xe0
   ? __dev_queue_xmit+0xa2c/0x1450
   __dev_queue_xmit+0xa48/0x1450
   ? ip_finish_output2+0xf4c/0x1130
   ip_finish_output2+0x6b6/0x1130
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? __ip_finish_output+0x217/0x380
   ? srso_alias_return_thunk+0x5/0xfbef5
   __ip_finish_output+0x217/0x380
   ip_output+0x99/0x120
   __ip_queue_xmit+0xae4/0xbc0
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? tcp_options_write.constprop.0+0xcb/0x3e0
   ip_queue_xmit+0x34/0x40
   __tcp_transmit_skb+0x1625/0x1890
   tcp_write_xmit+0x766/0x2fb0
   ? __entry_text_end+0x102ba9/0x102bad
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? __might_fault+0x74/0xc0
   ? srso_alias_return_thunk+0x5/0xfbef5
   __tcp_push_pending_frames+0x56/0x190
   tcp_push+0x117/0x310
   tcp_sendmsg_locked+0x14c1/0x1740
   tcp_sendmsg+0x28/0x40
   inet_sendmsg+0x5d/0x90
   sock_write_iter+0x242/0x2b0
   vfs_write+0x68d/0x800
   ? __pfx_sock_write_iter+0x10/0x10
   ksys_write+0xc8/0xf0
   __x64_sys_write+0x3d/0x50
   x64_sys_call+0xfaf/0x1f50
   do_syscall_64+0x6d/0x140
   entry_SYSCALL_64_after_hwframe+0x76/0x7e
  RIP: 0033:0x7f4d143af992
  Code: c3 8b 07 85 c0 75 24 49 89 fb 48 89 f0 48 89 d7 48 89 ce 4c 89 c2 4d 89 ca 4c 8b 44 24 08 4c 8b 4c 24 10 4c 89 5c 24 08 0f 05 <c3> e9 01 cc ff ff 41 54 b8 02 00 00 0
  RSP: 002b:00007ffd65032058 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
  RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f4d143af992
  RDX: 0000000000000025 RSI: 00007f4d143f3bcc RDI: 0000000000000005
  RBP: 00007f4d143f2b28 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000246 R12: 00007f4d143f3bcc
  R13: 0000000000000005 R14: 0000000000000000 R15: 00007ffd650323f0
   </TASK>

Fixes: 0b2c597 ("l2tp: close all race conditions in l2tp_tunnel_register()")
Suggested-by: Eric Dumazet <[email protected]>
Reported-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=6acef9e0a4d1f46c83d4
CC: [email protected]
CC: [email protected]
Signed-off-by: James Chapman <[email protected]>
Signed-off-by: Tom Parkin <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Aug 9, 2024
…ptions

Patch series "mm: split PTE/PMD PT table Kconfig cleanups+clarifications".

This series is a follow up to the fixes:
	"[PATCH v1 0/2] mm/hugetlb: fix hugetlb vs. core-mm PT locking"

When working on the fixes, I wondered why 8xx is fine (-> never uses split
PT locks) and how PT locking even works properly with PMD page table
sharing (-> always requires split PMD PT locks).

Let's improve the split PT lock detection, make hugetlb properly depend on
it and make 8xx bail out if it would ever get enabled by accident.

As an alternative to patch #3 we could extend the Kconfig
SPLIT_PTE_PTLOCKS option from patch #2 -- but enforcing it closer to the
code that actually implements it feels a bit nicer for documentation
purposes, and there is no need to actually disable it because it should
always be disabled (!SMP).

Did a bunch of cross-compilations to make sure that split PTE/PMD PT locks
are still getting used where we would expect them.

[1] https://lkml.kernel.org/r/[email protected]


This patch (of 3):

Let's clean that up a bit and prepare for depending on
CONFIG_SPLIT_PMD_PTLOCKS in other Kconfig options.

More cleanups would be reasonable (like the arch-specific "depends on" for
CONFIG_SPLIT_PTE_PTLOCKS), but we'll leave that for another day.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Acked-by: Mike Rapoport (Microsoft) <[email protected]>
Reviewed-by: Russell King (Oracle) <[email protected]>
Reviewed-by: Qi Zheng <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: "Naveen N. Rao" <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Aug 12, 2024
iter_finish_branch_entry() doesn't put the branch_info from/to map
elements creating memory leaks. This can be seen with:

```
$ perf record -e cycles -b perf test -w noploop
$ perf report -D
...
Direct leak of 984344 byte(s) in 123043 object(s) allocated from:
    #0 0x7fb2654f3bd7 in malloc libsanitizer/asan/asan_malloc_linux.cpp:69
    #1 0x564d3400d10b in map__get util/map.h:186
    #2 0x564d3400d10b in ip__resolve_ams util/machine.c:1981
    #3 0x564d34014d81 in sample__resolve_bstack util/machine.c:2151
    #4 0x564d34094790 in iter_prepare_branch_entry util/hist.c:898
    #5 0x564d34098fa4 in hist_entry_iter__add util/hist.c:1238
    #6 0x564d33d1f0c7 in process_sample_event tools/perf/builtin-report.c:334
    #7 0x564d34031eb7 in perf_session__deliver_event util/session.c:1655
    #8 0x564d3403ba52 in do_flush util/ordered-events.c:245
    #9 0x564d3403ba52 in __ordered_events__flush util/ordered-events.c:324
    #10 0x564d3402d32e in perf_session__process_user_event util/session.c:1708
    #11 0x564d34032480 in perf_session__process_event util/session.c:1877
    #12 0x564d340336ad in reader__read_event util/session.c:2399
    #13 0x564d34033fdc in reader__process_events util/session.c:2448
    #14 0x564d34033fdc in __perf_session__process_events util/session.c:2495
    #15 0x564d34033fdc in perf_session__process_events util/session.c:2661
    #16 0x564d33d27113 in __cmd_report tools/perf/builtin-report.c:1065
    #17 0x564d33d27113 in cmd_report tools/perf/builtin-report.c:1805
    #18 0x564d33e0ccb7 in run_builtin tools/perf/perf.c:350
    #19 0x564d33e0d45e in handle_internal_command tools/perf/perf.c:403
    #20 0x564d33cdd827 in run_argv tools/perf/perf.c:447
    #21 0x564d33cdd827 in main tools/perf/perf.c:561
...
```

Clearing up the map_symbols properly creates maps reference count
issues so resolve those. Resolving this issue doesn't improve peak
heap consumption for the test above.

Committer testing:

  $ sudo dnf install libasan
  $ make -k CORESIGHT=1 EXTRA_CFLAGS="-fsanitize=address" CC=clang O=/tmp/build/$(basename $PWD)/ -C tools/perf install-bin

Reviewed-by: Kan Liang <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sun Haiyong <[email protected]>
Cc: Yanteng Si <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Aug 12, 2024
Tariq Toukan says:

====================
mlx5 misc patches 2024-08-08

This patchset contains multiple enhancements from the team to the mlx5
core and Eth drivers.

Patch #1 by Chris bumps a defined value to permit more devices doing TC
offloads.

Patch #2 by Jianbo adds an IPsec fast-path optimization to replace the
slow async handling.

Patches #3 and #4 by Jianbo add TC offload support for complicated rules
to overcome firmware limitation.

Patch #5 by Gal unifies the access macro to advertised/supported link
modes.

Patches #6 to #9 by Gal adds extack messages in ethtool ops to replace
prints to the kernel log.

Patch #10 by Cosmin switches to using 'update' verb instead of 'replace'
to better reflect the operation.

Patch #11 by Cosmin exposes an update connection tracking operation to
replace the assumed delete+add implementaiton.
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Aug 12, 2024
…ptions

Patch series "mm: split PTE/PMD PT table Kconfig cleanups+clarifications".

This series is a follow up to the fixes:
	"[PATCH v1 0/2] mm/hugetlb: fix hugetlb vs. core-mm PT locking"

When working on the fixes, I wondered why 8xx is fine (-> never uses split
PT locks) and how PT locking even works properly with PMD page table
sharing (-> always requires split PMD PT locks).

Let's improve the split PT lock detection, make hugetlb properly depend on
it and make 8xx bail out if it would ever get enabled by accident.

As an alternative to patch #3 we could extend the Kconfig
SPLIT_PTE_PTLOCKS option from patch #2 -- but enforcing it closer to the
code that actually implements it feels a bit nicer for documentation
purposes, and there is no need to actually disable it because it should
always be disabled (!SMP).

Did a bunch of cross-compilations to make sure that split PTE/PMD PT locks
are still getting used where we would expect them.

[1] https://lkml.kernel.org/r/[email protected]


This patch (of 3):

Let's clean that up a bit and prepare for depending on
CONFIG_SPLIT_PMD_PTLOCKS in other Kconfig options.

More cleanups would be reasonable (like the arch-specific "depends on" for
CONFIG_SPLIT_PTE_PTLOCKS), but we'll leave that for another day.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Acked-by: Mike Rapoport (Microsoft) <[email protected]>
Reviewed-by: Russell King (Oracle) <[email protected]>
Reviewed-by: Qi Zheng <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: "Naveen N. Rao" <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Aug 14, 2024
Petr Machata says:

====================
net: nexthop: Increase weight to u16

In CLOS networks, as link failures occur at various points in the network,
ECMP weights of the involved nodes are adjusted to compensate. With high
fan-out of the involved nodes, and overall high number of nodes,
a (non-)ECMP weight ratio that we would like to configure does not fit into
8 bits. Instead of, say, 255:254, we might like to configure something like
1000:999. For these deployments, the 8-bit weight may not be enough.

To that end, in this patchset increase the next hop weight from u8 to u16.

Patch #1 adds a flag that indicates whether the reserved fields are zeroed.
This is a follow-up to a new fix merged in commit 6d745cd ("net:
nexthop: Initialize all fields in dumped nexthops"). The theory behind this
patch is that there is a strict ordering between the fields actually being
zeroed, the kernel declaring that they are, and the kernel repurposing the
fields. Thus clients can use the flag to tell if it is safe to interpret
the reserved fields in any way.

Patch #2 contains the substantial code and the commit message covers the
details of the changes.

Patches #3 to #6 add selftests.
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Aug 16, 2024
…ptions

Patch series "mm: split PTE/PMD PT table Kconfig cleanups+clarifications".

This series is a follow up to the fixes:
	"[PATCH v1 0/2] mm/hugetlb: fix hugetlb vs. core-mm PT locking"

When working on the fixes, I wondered why 8xx is fine (-> never uses split
PT locks) and how PT locking even works properly with PMD page table
sharing (-> always requires split PMD PT locks).

Let's improve the split PT lock detection, make hugetlb properly depend on
it and make 8xx bail out if it would ever get enabled by accident.

As an alternative to patch #3 we could extend the Kconfig
SPLIT_PTE_PTLOCKS option from patch #2 -- but enforcing it closer to the
code that actually implements it feels a bit nicer for documentation
purposes, and there is no need to actually disable it because it should
always be disabled (!SMP).

Did a bunch of cross-compilations to make sure that split PTE/PMD PT locks
are still getting used where we would expect them.

[1] https://lkml.kernel.org/r/[email protected]


This patch (of 3):

Let's clean that up a bit and prepare for depending on
CONFIG_SPLIT_PMD_PTLOCKS in other Kconfig options.

More cleanups would be reasonable (like the arch-specific "depends on" for
CONFIG_SPLIT_PTE_PTLOCKS), but we'll leave that for another day.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Acked-by: Mike Rapoport (Microsoft) <[email protected]>
Reviewed-by: Russell King (Oracle) <[email protected]>
Reviewed-by: Qi Zheng <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: "Naveen N. Rao" <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Aug 16, 2024
…ptions

Patch series "mm: split PTE/PMD PT table Kconfig cleanups+clarifications".

This series is a follow up to the fixes:
	"[PATCH v1 0/2] mm/hugetlb: fix hugetlb vs. core-mm PT locking"

When working on the fixes, I wondered why 8xx is fine (-> never uses split
PT locks) and how PT locking even works properly with PMD page table
sharing (-> always requires split PMD PT locks).

Let's improve the split PT lock detection, make hugetlb properly depend on
it and make 8xx bail out if it would ever get enabled by accident.

As an alternative to patch #3 we could extend the Kconfig
SPLIT_PTE_PTLOCKS option from patch #2 -- but enforcing it closer to the
code that actually implements it feels a bit nicer for documentation
purposes, and there is no need to actually disable it because it should
always be disabled (!SMP).

Did a bunch of cross-compilations to make sure that split PTE/PMD PT locks
are still getting used where we would expect them.

[1] https://lkml.kernel.org/r/[email protected]


This patch (of 3):

Let's clean that up a bit and prepare for depending on
CONFIG_SPLIT_PMD_PTLOCKS in other Kconfig options.

More cleanups would be reasonable (like the arch-specific "depends on" for
CONFIG_SPLIT_PTE_PTLOCKS), but we'll leave that for another day.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Acked-by: Mike Rapoport (Microsoft) <[email protected]>
Reviewed-by: Russell King (Oracle) <[email protected]>
Reviewed-by: Qi Zheng <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: "Naveen N. Rao" <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Aug 21, 2024
AddressSanitizer found a use-after-free bug in the symbol code which
manifested as 'perf top' segfaulting.

  ==1238389==ERROR: AddressSanitizer: heap-use-after-free on address 0x60b00c48844b at pc 0x5650d8035961 bp 0x7f751aaecc90 sp 0x7f751aaecc80
  READ of size 1 at 0x60b00c48844b thread T193
      #0 0x5650d8035960 in _sort__sym_cmp util/sort.c:310
      #1 0x5650d8043744 in hist_entry__cmp util/hist.c:1286
      #2 0x5650d8043951 in hists__findnew_entry util/hist.c:614
      #3 0x5650d804568f in __hists__add_entry util/hist.c:754
      #4 0x5650d8045bf9 in hists__add_entry util/hist.c:772
      #5 0x5650d8045df1 in iter_add_single_normal_entry util/hist.c:997
      #6 0x5650d8043326 in hist_entry_iter__add util/hist.c:1242
      #7 0x5650d7ceeefe in perf_event__process_sample /home/matt/src/linux/tools/perf/builtin-top.c:845
      #8 0x5650d7ceeefe in deliver_event /home/matt/src/linux/tools/perf/builtin-top.c:1208
      #9 0x5650d7fdb51b in do_flush util/ordered-events.c:245
      #10 0x5650d7fdb51b in __ordered_events__flush util/ordered-events.c:324
      #11 0x5650d7ced743 in process_thread /home/matt/src/linux/tools/perf/builtin-top.c:1120
      #12 0x7f757ef1f133 in start_thread nptl/pthread_create.c:442
      #13 0x7f757ef9f7db in clone3 ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

When updating hist maps it's also necessary to update the hist symbol
reference because the old one gets freed in map__put().

While this bug was probably introduced with 5c24b67 ("perf
tools: Replace map->referenced & maps->removed_maps with map->refcnt"),
the symbol objects were leaked until c087e94 ("perf machine:
Fix refcount usage when processing PERF_RECORD_KSYMBOL") was merged so
the bug was masked.

Fixes: c087e94 ("perf machine: Fix refcount usage when processing PERF_RECORD_KSYMBOL")
Reported-by: Yunzhao Li <[email protected]>
Signed-off-by: Matt Fleming (Cloudflare) <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: [email protected]
Cc: Namhyung Kim <[email protected]>
Cc: Riccardo Mancini <[email protected]>
Cc: [email protected] # v5.13+
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Aug 21, 2024
Lockdep reported a warning in Linux version 6.6:

[  414.344659] ================================
[  414.345155] WARNING: inconsistent lock state
[  414.345658] 6.6.0-07439-gba2303cacfda #6 Not tainted
[  414.346221] --------------------------------
[  414.346712] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
[  414.347545] kworker/u10:3/1152 [HC0[0]:SC0[0]:HE0:SE1] takes:
[  414.349245] ffff88810edd1098 (&sbq->ws[i].wait){+.?.}-{2:2}, at: blk_mq_dispatch_rq_list+0x131c/0x1ee0
[  414.351204] {IN-SOFTIRQ-W} state was registered at:
[  414.351751]   lock_acquire+0x18d/0x460
[  414.352218]   _raw_spin_lock_irqsave+0x39/0x60
[  414.352769]   __wake_up_common_lock+0x22/0x60
[  414.353289]   sbitmap_queue_wake_up+0x375/0x4f0
[  414.353829]   sbitmap_queue_clear+0xdd/0x270
[  414.354338]   blk_mq_put_tag+0xdf/0x170
[  414.354807]   __blk_mq_free_request+0x381/0x4d0
[  414.355335]   blk_mq_free_request+0x28b/0x3e0
[  414.355847]   __blk_mq_end_request+0x242/0xc30
[  414.356367]   scsi_end_request+0x2c1/0x830
[  414.345155] WARNING: inconsistent lock state
[  414.345658] 6.6.0-07439-gba2303cacfda #6 Not tainted
[  414.346221] --------------------------------
[  414.346712] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
[  414.347545] kworker/u10:3/1152 [HC0[0]:SC0[0]:HE0:SE1] takes:
[  414.349245] ffff88810edd1098 (&sbq->ws[i].wait){+.?.}-{2:2}, at: blk_mq_dispatch_rq_list+0x131c/0x1ee0
[  414.351204] {IN-SOFTIRQ-W} state was registered at:
[  414.351751]   lock_acquire+0x18d/0x460
[  414.352218]   _raw_spin_lock_irqsave+0x39/0x60
[  414.352769]   __wake_up_common_lock+0x22/0x60
[  414.353289]   sbitmap_queue_wake_up+0x375/0x4f0
[  414.353829]   sbitmap_queue_clear+0xdd/0x270
[  414.354338]   blk_mq_put_tag+0xdf/0x170
[  414.354807]   __blk_mq_free_request+0x381/0x4d0
[  414.355335]   blk_mq_free_request+0x28b/0x3e0
[  414.355847]   __blk_mq_end_request+0x242/0xc30
[  414.356367]   scsi_end_request+0x2c1/0x830
[  414.356863]   scsi_io_completion+0x177/0x1610
[  414.357379]   scsi_complete+0x12f/0x260
[  414.357856]   blk_complete_reqs+0xba/0xf0
[  414.358338]   __do_softirq+0x1b0/0x7a2
[  414.358796]   irq_exit_rcu+0x14b/0x1a0
[  414.359262]   sysvec_call_function_single+0xaf/0xc0
[  414.359828]   asm_sysvec_call_function_single+0x1a/0x20
[  414.360426]   default_idle+0x1e/0x30
[  414.360873]   default_idle_call+0x9b/0x1f0
[  414.361390]   do_idle+0x2d2/0x3e0
[  414.361819]   cpu_startup_entry+0x55/0x60
[  414.362314]   start_secondary+0x235/0x2b0
[  414.362809]   secondary_startup_64_no_verify+0x18f/0x19b
[  414.363413] irq event stamp: 428794
[  414.363825] hardirqs last  enabled at (428793): [<ffffffff816bfd1c>] ktime_get+0x1dc/0x200
[  414.364694] hardirqs last disabled at (428794): [<ffffffff85470177>] _raw_spin_lock_irq+0x47/0x50
[  414.365629] softirqs last  enabled at (428444): [<ffffffff85474780>] __do_softirq+0x540/0x7a2
[  414.366522] softirqs last disabled at (428419): [<ffffffff813f65ab>] irq_exit_rcu+0x14b/0x1a0
[  414.367425]
               other info that might help us debug this:
[  414.368194]  Possible unsafe locking scenario:
[  414.368900]        CPU0
[  414.369225]        ----
[  414.369548]   lock(&sbq->ws[i].wait);
[  414.370000]   <Interrupt>
[  414.370342]     lock(&sbq->ws[i].wait);
[  414.370802]
                *** DEADLOCK ***
[  414.371569] 5 locks held by kworker/u10:3/1152:
[  414.372088]  #0: ffff88810130e938 ((wq_completion)writeback){+.+.}-{0:0}, at: process_scheduled_works+0x357/0x13f0
[  414.373180]  #1: ffff88810201fdb8 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}, at: process_scheduled_works+0x3a3/0x13f0
[  414.374384]  #2: ffffffff86ffbdc0 (rcu_read_lock){....}-{1:2}, at: blk_mq_run_hw_queue+0x637/0xa00
[  414.375342]  #3: ffff88810edd1098 (&sbq->ws[i].wait){+.?.}-{2:2}, at: blk_mq_dispatch_rq_list+0x131c/0x1ee0
[  414.376377]  #4: ffff888106205a08 (&hctx->dispatch_wait_lock){+.-.}-{2:2}, at: blk_mq_dispatch_rq_list+0x1337/0x1ee0
[  414.378607]
               stack backtrace:
[  414.379177] CPU: 0 PID: 1152 Comm: kworker/u10:3 Not tainted 6.6.0-07439-gba2303cacfda #6
[  414.380032] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[  414.381177] Workqueue: writeback wb_workfn (flush-253:0)
[  414.381805] Call Trace:
[  414.382136]  <TASK>
[  414.382429]  dump_stack_lvl+0x91/0xf0
[  414.382884]  mark_lock_irq+0xb3b/0x1260
[  414.383367]  ? __pfx_mark_lock_irq+0x10/0x10
[  414.383889]  ? stack_trace_save+0x8e/0xc0
[  414.384373]  ? __pfx_stack_trace_save+0x10/0x10
[  414.384903]  ? graph_lock+0xcf/0x410
[  414.385350]  ? save_trace+0x3d/0xc70
[  414.385808]  mark_lock.part.20+0x56d/0xa90
[  414.386317]  mark_held_locks+0xb0/0x110
[  414.386791]  ? __pfx_do_raw_spin_lock+0x10/0x10
[  414.387320]  lockdep_hardirqs_on_prepare+0x297/0x3f0
[  414.387901]  ? _raw_spin_unlock_irq+0x28/0x50
[  414.388422]  trace_hardirqs_on+0x58/0x100
[  414.388917]  _raw_spin_unlock_irq+0x28/0x50
[  414.389422]  __blk_mq_tag_busy+0x1d6/0x2a0
[  414.389920]  __blk_mq_get_driver_tag+0x761/0x9f0
[  414.390899]  blk_mq_dispatch_rq_list+0x1780/0x1ee0
[  414.391473]  ? __pfx_blk_mq_dispatch_rq_list+0x10/0x10
[  414.392070]  ? sbitmap_get+0x2b8/0x450
[  414.392533]  ? __blk_mq_get_driver_tag+0x210/0x9f0
[  414.393095]  __blk_mq_sched_dispatch_requests+0xd99/0x1690
[  414.393730]  ? elv_attempt_insert_merge+0x1b1/0x420
[  414.394302]  ? __pfx___blk_mq_sched_dispatch_requests+0x10/0x10
[  414.394970]  ? lock_acquire+0x18d/0x460
[  414.395456]  ? blk_mq_run_hw_queue+0x637/0xa00
[  414.395986]  ? __pfx_lock_acquire+0x10/0x10
[  414.396499]  blk_mq_sched_dispatch_requests+0x109/0x190
[  414.397100]  blk_mq_run_hw_queue+0x66e/0xa00
[  414.397616]  blk_mq_flush_plug_list.part.17+0x614/0x2030
[  414.398244]  ? __pfx_blk_mq_flush_plug_list.part.17+0x10/0x10
[  414.398897]  ? writeback_sb_inodes+0x241/0xcc0
[  414.399429]  blk_mq_flush_plug_list+0x65/0x80
[  414.399957]  __blk_flush_plug+0x2f1/0x530
[  414.400458]  ? __pfx___blk_flush_plug+0x10/0x10
[  414.400999]  blk_finish_plug+0x59/0xa0
[  414.401467]  wb_writeback+0x7cc/0x920
[  414.401935]  ? __pfx_wb_writeback+0x10/0x10
[  414.402442]  ? mark_held_locks+0xb0/0x110
[  414.402931]  ? __pfx_do_raw_spin_lock+0x10/0x10
[  414.403462]  ? lockdep_hardirqs_on_prepare+0x297/0x3f0
[  414.404062]  wb_workfn+0x2b3/0xcf0
[  414.404500]  ? __pfx_wb_workfn+0x10/0x10
[  414.404989]  process_scheduled_works+0x432/0x13f0
[  414.405546]  ? __pfx_process_scheduled_works+0x10/0x10
[  414.406139]  ? do_raw_spin_lock+0x101/0x2a0
[  414.406641]  ? assign_work+0x19b/0x240
[  414.407106]  ? lock_is_held_type+0x9d/0x110
[  414.407604]  worker_thread+0x6f2/0x1160
[  414.408075]  ? __kthread_parkme+0x62/0x210
[  414.408572]  ? lockdep_hardirqs_on_prepare+0x297/0x3f0
[  414.409168]  ? __kthread_parkme+0x13c/0x210
[  414.409678]  ? __pfx_worker_thread+0x10/0x10
[  414.410191]  kthread+0x33c/0x440
[  414.410602]  ? __pfx_kthread+0x10/0x10
[  414.411068]  ret_from_fork+0x4d/0x80
[  414.411526]  ? __pfx_kthread+0x10/0x10
[  414.411993]  ret_from_fork_asm+0x1b/0x30
[  414.412489]  </TASK>

When interrupt is turned on while a lock holding by spin_lock_irq it
throws a warning because of potential deadlock.

blk_mq_prep_dispatch_rq
 blk_mq_get_driver_tag
  __blk_mq_get_driver_tag
   __blk_mq_alloc_driver_tag
    blk_mq_tag_busy -> tag is already busy
    // failed to get driver tag
 blk_mq_mark_tag_wait
  spin_lock_irq(&wq->lock) -> lock A (&sbq->ws[i].wait)
  __add_wait_queue(wq, wait) -> wait queue active
  blk_mq_get_driver_tag
  __blk_mq_tag_busy
-> 1) tag must be idle, which means there can't be inflight IO
   spin_lock_irq(&tags->lock) -> lock B (hctx->tags)
   spin_unlock_irq(&tags->lock) -> unlock B, turn on interrupt accidentally
-> 2) context must be preempt by IO interrupt to trigger deadlock.

As shown above, the deadlock is not possible in theory, but the warning
still need to be fixed.

Fix it by using spin_lock_irqsave to get lockB instead of spin_lock_irq.

Fixes: 4f1731d ("blk-mq: fix potential io hang by wrong 'wake_batch'")
Signed-off-by: Li Lingfeng <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Reviewed-by: Yu Kuai <[email protected]>
Reviewed-by: Bart Van Assche <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Aug 21, 2024
…ptions

Patch series "mm: split PTE/PMD PT table Kconfig cleanups+clarifications".

This series is a follow up to the fixes:
	"[PATCH v1 0/2] mm/hugetlb: fix hugetlb vs. core-mm PT locking"

When working on the fixes, I wondered why 8xx is fine (-> never uses split
PT locks) and how PT locking even works properly with PMD page table
sharing (-> always requires split PMD PT locks).

Let's improve the split PT lock detection, make hugetlb properly depend on
it and make 8xx bail out if it would ever get enabled by accident.

As an alternative to patch #3 we could extend the Kconfig
SPLIT_PTE_PTLOCKS option from patch #2 -- but enforcing it closer to the
code that actually implements it feels a bit nicer for documentation
purposes, and there is no need to actually disable it because it should
always be disabled (!SMP).

Did a bunch of cross-compilations to make sure that split PTE/PMD PT locks
are still getting used where we would expect them.

[1] https://lkml.kernel.org/r/[email protected]


This patch (of 3):

Let's clean that up a bit and prepare for depending on
CONFIG_SPLIT_PMD_PTLOCKS in other Kconfig options.

More cleanups would be reasonable (like the arch-specific "depends on" for
CONFIG_SPLIT_PTE_PTLOCKS), but we'll leave that for another day.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Acked-by: Mike Rapoport (Microsoft) <[email protected]>
Reviewed-by: Russell King (Oracle) <[email protected]>
Reviewed-by: Qi Zheng <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: "Naveen N. Rao" <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Aug 21, 2024
Ido Schimmel says:

====================
Preparations for FIB rule DSCP selector

This patchset moves the masking of the upper DSCP bits in 'flowi4_tos'
to the core instead of relying on callers of the FIB lookup API to do
it.

This will allow us to start changing users of the API to initialize the
'flowi4_tos' field with all six bits of the DSCP field. In turn, this
will allow us to extend FIB rules with a new DSCP selector.

By masking the upper DSCP bits in the core we are able to maintain the
behavior of the TOS selector in FIB rules and routes to only match on
the lower DSCP bits.

While working on this I found two users of the API that do not mask the
upper DSCP bits before performing the lookup. The first is an ancient
netlink family that is unlikely to be used. It is adjusted in patch #1
to mask both the upper DSCP bits and the ECN bits before calling the
API.

The second user is a nftables module that differs in this regard from
its equivalent iptables module. It is adjusted in patch #2 to invoke the
API with the upper DSCP bits masked, like all other callers. The
relevant selftest passed, but in the unlikely case that regressions are
reported because of this change, we can restore the existing behavior
using a new flow information flag as discussed here [1].

The last patch moves the masking of the upper DSCP bits to the core,
making the first two patches redundant, but I wanted to post them
separately to call attention to the behavior change for these two users
of the FIB lookup API.

Future patchsets (around 3) will start unmasking the upper DSCP bits
throughout the networking stack before adding support for the new FIB
rule DSCP selector.

Changes from v1 [2]:

Patch #3: Include <linux/ip.h> in <linux/in_route.h> instead of
including it in net/ip_fib.h

[1] https://lore.kernel.org/netdev/ZpqpB8vJU%2FQ6LSqa@debian/
[2] https://lore.kernel.org/netdev/[email protected]/
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Sep 5, 2024
commit 823430c ("memory tier: consolidate the initialization of
memory tiers") introduces a locking change that use guard(mutex) to
instead of mutex_lock/unlock() for memory_tier_lock.  It unexpectedly
expanded the locked region to include the hotplug_memory_notifier(), as a
result, it triggers an locking dependency detected of ABBA deadlock. 
Exclude hotplug_memory_notifier() from the locked region to fixing it.

The deadlock scenario is that when a memory online event occurs, the
execution of memory notifier will access the read lock of the
memory_chain.rwsem, then the reigistration of the memory notifier in
memory_tier_init() acquires the write lock of the memory_chain.rwsem while
holding memory_tier_lock.  Then the memory online event continues to
invoke the memory hotplug callback registered by memory_tier_init(). 
Since this callback tries to acquire the memory_tier_lock, a deadlock
occurs.

In fact, this deadlock can't happen because memory_tier_init() always
executes before memory online events happen due to the subsys_initcall()
has an higher priority than module_init().

[  133.491106] WARNING: possible circular locking dependency detected
[  133.493656] 6.11.0-rc2+ #146 Tainted: G           O     N
[  133.504290] ------------------------------------------------------
[  133.515194] (udev-worker)/1133 is trying to acquire lock:
[  133.525715] ffffffff87044e28 (memory_tier_lock){+.+.}-{3:3}, at: memtier_hotplug_callback+0x383/0x4b0
[  133.536449]
[  133.536449] but task is already holding lock:
[  133.549847] ffffffff875d3310 ((memory_chain).rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x60/0xb0
[  133.556781]
[  133.556781] which lock already depends on the new lock.
[  133.556781]
[  133.569957]
[  133.569957] the existing dependency chain (in reverse order) is:
[  133.577618]
[  133.577618] -> #1 ((memory_chain).rwsem){++++}-{3:3}:
[  133.584997]        down_write+0x97/0x210
[  133.588647]        blocking_notifier_chain_register+0x71/0xd0
[  133.592537]        register_memory_notifier+0x26/0x30
[  133.596314]        memory_tier_init+0x187/0x300
[  133.599864]        do_one_initcall+0x117/0x5d0
[  133.603399]        kernel_init_freeable+0xab0/0xeb0
[  133.606986]        kernel_init+0x28/0x2f0
[  133.610312]        ret_from_fork+0x59/0x90
[  133.613652]        ret_from_fork_asm+0x1a/0x30
[  133.617012]
[  133.617012] -> #0 (memory_tier_lock){+.+.}-{3:3}:
[  133.623390]        __lock_acquire+0x2efd/0x5c60
[  133.626730]        lock_acquire+0x1ce/0x580
[  133.629757]        __mutex_lock+0x15c/0x1490
[  133.632731]        mutex_lock_nested+0x1f/0x30
[  133.635717]        memtier_hotplug_callback+0x383/0x4b0
[  133.638748]        notifier_call_chain+0xbf/0x370
[  133.641647]        blocking_notifier_call_chain+0x76/0xb0
[  133.644636]        memory_notify+0x2e/0x40
[  133.647427]        online_pages+0x597/0x720
[  133.650246]        memory_subsys_online+0x4f6/0x7f0
[  133.653107]        device_online+0x141/0x1d0
[  133.655831]        online_memory_block+0x4d/0x60
[  133.658616]        walk_memory_blocks+0xc0/0x120
[  133.661419]        add_memory_resource+0x51d/0x6c0
[  133.664202]        add_memory_driver_managed+0xf5/0x180
[  133.667060]        dev_dax_kmem_probe+0x7f7/0xb40 [kmem]
[  133.669949]        dax_bus_probe+0x147/0x230
[  133.672687]        really_probe+0x27f/0xac0
[  133.675463]        __driver_probe_device+0x1f3/0x460
[  133.678493]        driver_probe_device+0x56/0x1b0
[  133.681366]        __driver_attach+0x277/0x570
[  133.684149]        bus_for_each_dev+0x145/0x1e0
[  133.686937]        driver_attach+0x49/0x60
[  133.689673]        bus_add_driver+0x2f3/0x6b0
[  133.692421]        driver_register+0x170/0x4b0
[  133.695118]        __dax_driver_register+0x141/0x1b0
[  133.697910]        dax_kmem_init+0x54/0xff0 [kmem]
[  133.700794]        do_one_initcall+0x117/0x5d0
[  133.703455]        do_init_module+0x277/0x750
[  133.706054]        load_module+0x5d1d/0x74f0
[  133.708602]        init_module_from_file+0x12c/0x1a0
[  133.711234]        idempotent_init_module+0x3f1/0x690
[  133.713937]        __x64_sys_finit_module+0x10e/0x1a0
[  133.716492]        x64_sys_call+0x184d/0x20d0
[  133.719053]        do_syscall_64+0x6d/0x140
[  133.721537]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  133.724239]
[  133.724239] other info that might help us debug this:
[  133.724239]
[  133.730832]  Possible unsafe locking scenario:
[  133.730832]
[  133.735298]        CPU0                    CPU1
[  133.737759]        ----                    ----
[  133.740165]   rlock((memory_chain).rwsem);
[  133.742623]                                lock(memory_tier_lock);
[  133.745357]                                lock((memory_chain).rwsem);
[  133.748141]   lock(memory_tier_lock);
[  133.750489]
[  133.750489]  *** DEADLOCK ***
[  133.750489]
[  133.756742] 6 locks held by (udev-worker)/1133:
[  133.759179]  #0: ffff888207be6158 (&dev->mutex){....}-{3:3}, at: __driver_attach+0x26c/0x570
[  133.762299]  #1: ffffffff875b5868 (device_hotplug_lock){+.+.}-{3:3}, at: lock_device_hotplug+0x20/0x30
[  133.765565]  #2: ffff88820cf6a108 (&dev->mutex){....}-{3:3}, at: device_online+0x2f/0x1d0
[  133.768978]  #3: ffffffff86d08ff0 (cpu_hotplug_lock){++++}-{0:0}, at: mem_hotplug_begin+0x17/0x30
[  133.772312]  #4: ffffffff8702dfb0 (mem_hotplug_lock){++++}-{0:0}, at: mem_hotplug_begin+0x23/0x30
[  133.775544]  #5: ffffffff875d3310 ((memory_chain).rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x60/0xb0
[  133.779113]
[  133.779113] stack backtrace:
[  133.783728] CPU: 5 UID: 0 PID: 1133 Comm: (udev-worker) Tainted: G           O     N 6.11.0-rc2+ #146
[  133.787220] Tainted: [O]=OOT_MODULE, [N]=TEST
[  133.789948] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
[  133.793291] Call Trace:
[  133.795826]  <TASK>
[  133.798284]  dump_stack_lvl+0xea/0x150
[  133.801025]  dump_stack+0x19/0x20
[  133.803609]  print_circular_bug+0x477/0x740
[  133.806341]  check_noncircular+0x2f4/0x3e0
[  133.809056]  ? __pfx_check_noncircular+0x10/0x10
[  133.811866]  ? __pfx_lockdep_lock+0x10/0x10
[  133.814670]  ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30
[  133.817610]  __lock_acquire+0x2efd/0x5c60
[  133.820339]  ? __pfx___lock_acquire+0x10/0x10
[  133.823128]  ? __dax_driver_register+0x141/0x1b0
[  133.825926]  ? do_one_initcall+0x117/0x5d0
[  133.828648]  lock_acquire+0x1ce/0x580
[  133.831349]  ? memtier_hotplug_callback+0x383/0x4b0
[  133.834293]  ? __pfx_lock_acquire+0x10/0x10
[  133.837134]  __mutex_lock+0x15c/0x1490
[  133.839829]  ? memtier_hotplug_callback+0x383/0x4b0
[  133.842753]  ? memtier_hotplug_callback+0x383/0x4b0
[  133.845602]  ? __this_cpu_preempt_check+0x21/0x30
[  133.848438]  ? __pfx___mutex_lock+0x10/0x10
[  133.851200]  ? __pfx_lock_acquire+0x10/0x10
[  133.853935]  ? global_dirty_limits+0xc0/0x160
[  133.856699]  ? __sanitizer_cov_trace_switch+0x58/0xa0
[  133.859564]  mutex_lock_nested+0x1f/0x30
[  133.862251]  ? mutex_lock_nested+0x1f/0x30
[  133.864964]  memtier_hotplug_callback+0x383/0x4b0
[  133.867752]  notifier_call_chain+0xbf/0x370
[  133.870550]  ? writeback_set_ratelimit+0xe8/0x160
[  133.873372]  blocking_notifier_call_chain+0x76/0xb0
[  133.876311]  memory_notify+0x2e/0x40
[  133.879013]  online_pages+0x597/0x720
[  133.881686]  ? irqentry_exit+0x3e/0xa0
[  133.884397]  ? __pfx_online_pages+0x10/0x10
[  133.887244]  ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30
[  133.890299]  ? mhp_init_memmap_on_memory+0x7a/0x1c0
[  133.893203]  memory_subsys_online+0x4f6/0x7f0
[  133.896099]  ? __pfx_memory_subsys_online+0x10/0x10
[  133.899039]  ? xa_load+0x16d/0x2e0
[  133.901667]  ? __pfx_xa_load+0x10/0x10
[  133.904366]  ? __pfx_memory_subsys_online+0x10/0x10
[  133.907218]  device_online+0x141/0x1d0
[  133.909845]  online_memory_block+0x4d/0x60
[  133.912494]  walk_memory_blocks+0xc0/0x120
[  133.915104]  ? __pfx_online_memory_block+0x10/0x10
[  133.917776]  add_memory_resource+0x51d/0x6c0
[  133.920404]  ? __pfx_add_memory_resource+0x10/0x10
[  133.923104]  ? _raw_write_unlock+0x31/0x60
[  133.925781]  ? register_memory_resource+0x119/0x180
[  133.928450]  add_memory_driver_managed+0xf5/0x180
[  133.931036]  dev_dax_kmem_probe+0x7f7/0xb40 [kmem]
[  133.933665]  ? __pfx_dev_dax_kmem_probe+0x10/0x10 [kmem]
[  133.936332]  ? __pfx___up_read+0x10/0x10
[  133.938878]  dax_bus_probe+0x147/0x230
[  133.941332]  ? __pfx_dax_bus_probe+0x10/0x10
[  133.943954]  really_probe+0x27f/0xac0
[  133.946387]  ? __sanitizer_cov_trace_const_cmp1+0x1e/0x30
[  133.949106]  __driver_probe_device+0x1f3/0x460
[  133.951704]  ? parse_option_str+0x149/0x190
[  133.954241]  driver_probe_device+0x56/0x1b0
[  133.956749]  __driver_attach+0x277/0x570
[  133.959228]  ? __pfx___driver_attach+0x10/0x10
[  133.961776]  bus_for_each_dev+0x145/0x1e0
[  133.964367]  ? __pfx_bus_for_each_dev+0x10/0x10
[  133.967019]  ? __kasan_check_read+0x15/0x20
[  133.969543]  ? _raw_spin_unlock+0x31/0x60
[  133.972132]  driver_attach+0x49/0x60
[  133.974536]  bus_add_driver+0x2f3/0x6b0
[  133.977044]  driver_register+0x170/0x4b0
[  133.979480]  __dax_driver_register+0x141/0x1b0
[  133.982126]  ? __pfx_dax_kmem_init+0x10/0x10 [kmem]
[  133.984724]  dax_kmem_init+0x54/0xff0 [kmem]
[  133.987284]  ? __pfx_dax_kmem_init+0x10/0x10 [kmem]
[  133.989965]  do_one_initcall+0x117/0x5d0
[  133.992506]  ? __pfx_do_one_initcall+0x10/0x10
[  133.995185]  ? __kasan_kmalloc+0x88/0xa0
[  133.997748]  ? kasan_poison+0x3e/0x60
[  134.000288]  ? kasan_unpoison+0x2c/0x60
[  134.002762]  ? kasan_poison+0x3e/0x60
[  134.005202]  ? __asan_register_globals+0x62/0x80
[  134.007753]  ? __pfx_dax_kmem_init+0x10/0x10 [kmem]
[  134.010439]  do_init_module+0x277/0x750
[  134.012953]  load_module+0x5d1d/0x74f0
[  134.015406]  ? __pfx_load_module+0x10/0x10
[  134.017887]  ? __pfx_ima_post_read_file+0x10/0x10
[  134.020470]  ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30
[  134.023127]  ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
[  134.025767]  ? security_kernel_post_read_file+0xa2/0xd0
[  134.028429]  ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
[  134.031162]  ? kernel_read_file+0x503/0x820
[  134.033645]  ? __pfx_kernel_read_file+0x10/0x10
[  134.036232]  ? __pfx___lock_acquire+0x10/0x10
[  134.038766]  init_module_from_file+0x12c/0x1a0
[  134.041291]  ? init_module_from_file+0x12c/0x1a0
[  134.043936]  ? __pfx_init_module_from_file+0x10/0x10
[  134.046516]  ? __this_cpu_preempt_check+0x21/0x30
[  134.049091]  ? __kasan_check_read+0x15/0x20
[  134.051551]  ? do_raw_spin_unlock+0x60/0x210
[  134.054077]  idempotent_init_module+0x3f1/0x690
[  134.056643]  ? __pfx_idempotent_init_module+0x10/0x10
[  134.059318]  ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
[  134.061995]  ? __fget_light+0x17d/0x210
[  134.064428]  __x64_sys_finit_module+0x10e/0x1a0
[  134.066976]  x64_sys_call+0x184d/0x20d0
[  134.069405]  do_syscall_64+0x6d/0x140
[  134.071926]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 823430c ("memory tier: consolidate the initialization of memory tiers")
Signed-off-by: Yanfei Xu <[email protected]>
Reviewed-by: "Huang, Ying" <[email protected]>
Cc: Ho-Ren (Jack) Chuang <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Sep 5, 2024
commit 823430c ("memory tier: consolidate the initialization of
memory tiers") introduces a locking change that use guard(mutex) to
instead of mutex_lock/unlock() for memory_tier_lock.  It unexpectedly
expanded the locked region to include the hotplug_memory_notifier(), as a
result, it triggers an locking dependency detected of ABBA deadlock. 
Exclude hotplug_memory_notifier() from the locked region to fixing it.

The deadlock scenario is that when a memory online event occurs, the
execution of memory notifier will access the read lock of the
memory_chain.rwsem, then the reigistration of the memory notifier in
memory_tier_init() acquires the write lock of the memory_chain.rwsem while
holding memory_tier_lock.  Then the memory online event continues to
invoke the memory hotplug callback registered by memory_tier_init(). 
Since this callback tries to acquire the memory_tier_lock, a deadlock
occurs.

In fact, this deadlock can't happen because memory_tier_init() always
executes before memory online events happen due to the subsys_initcall()
has an higher priority than module_init().

[  133.491106] WARNING: possible circular locking dependency detected
[  133.493656] 6.11.0-rc2+ #146 Tainted: G           O     N
[  133.504290] ------------------------------------------------------
[  133.515194] (udev-worker)/1133 is trying to acquire lock:
[  133.525715] ffffffff87044e28 (memory_tier_lock){+.+.}-{3:3}, at: memtier_hotplug_callback+0x383/0x4b0
[  133.536449]
[  133.536449] but task is already holding lock:
[  133.549847] ffffffff875d3310 ((memory_chain).rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x60/0xb0
[  133.556781]
[  133.556781] which lock already depends on the new lock.
[  133.556781]
[  133.569957]
[  133.569957] the existing dependency chain (in reverse order) is:
[  133.577618]
[  133.577618] -> #1 ((memory_chain).rwsem){++++}-{3:3}:
[  133.584997]        down_write+0x97/0x210
[  133.588647]        blocking_notifier_chain_register+0x71/0xd0
[  133.592537]        register_memory_notifier+0x26/0x30
[  133.596314]        memory_tier_init+0x187/0x300
[  133.599864]        do_one_initcall+0x117/0x5d0
[  133.603399]        kernel_init_freeable+0xab0/0xeb0
[  133.606986]        kernel_init+0x28/0x2f0
[  133.610312]        ret_from_fork+0x59/0x90
[  133.613652]        ret_from_fork_asm+0x1a/0x30
[  133.617012]
[  133.617012] -> #0 (memory_tier_lock){+.+.}-{3:3}:
[  133.623390]        __lock_acquire+0x2efd/0x5c60
[  133.626730]        lock_acquire+0x1ce/0x580
[  133.629757]        __mutex_lock+0x15c/0x1490
[  133.632731]        mutex_lock_nested+0x1f/0x30
[  133.635717]        memtier_hotplug_callback+0x383/0x4b0
[  133.638748]        notifier_call_chain+0xbf/0x370
[  133.641647]        blocking_notifier_call_chain+0x76/0xb0
[  133.644636]        memory_notify+0x2e/0x40
[  133.647427]        online_pages+0x597/0x720
[  133.650246]        memory_subsys_online+0x4f6/0x7f0
[  133.653107]        device_online+0x141/0x1d0
[  133.655831]        online_memory_block+0x4d/0x60
[  133.658616]        walk_memory_blocks+0xc0/0x120
[  133.661419]        add_memory_resource+0x51d/0x6c0
[  133.664202]        add_memory_driver_managed+0xf5/0x180
[  133.667060]        dev_dax_kmem_probe+0x7f7/0xb40 [kmem]
[  133.669949]        dax_bus_probe+0x147/0x230
[  133.672687]        really_probe+0x27f/0xac0
[  133.675463]        __driver_probe_device+0x1f3/0x460
[  133.678493]        driver_probe_device+0x56/0x1b0
[  133.681366]        __driver_attach+0x277/0x570
[  133.684149]        bus_for_each_dev+0x145/0x1e0
[  133.686937]        driver_attach+0x49/0x60
[  133.689673]        bus_add_driver+0x2f3/0x6b0
[  133.692421]        driver_register+0x170/0x4b0
[  133.695118]        __dax_driver_register+0x141/0x1b0
[  133.697910]        dax_kmem_init+0x54/0xff0 [kmem]
[  133.700794]        do_one_initcall+0x117/0x5d0
[  133.703455]        do_init_module+0x277/0x750
[  133.706054]        load_module+0x5d1d/0x74f0
[  133.708602]        init_module_from_file+0x12c/0x1a0
[  133.711234]        idempotent_init_module+0x3f1/0x690
[  133.713937]        __x64_sys_finit_module+0x10e/0x1a0
[  133.716492]        x64_sys_call+0x184d/0x20d0
[  133.719053]        do_syscall_64+0x6d/0x140
[  133.721537]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  133.724239]
[  133.724239] other info that might help us debug this:
[  133.724239]
[  133.730832]  Possible unsafe locking scenario:
[  133.730832]
[  133.735298]        CPU0                    CPU1
[  133.737759]        ----                    ----
[  133.740165]   rlock((memory_chain).rwsem);
[  133.742623]                                lock(memory_tier_lock);
[  133.745357]                                lock((memory_chain).rwsem);
[  133.748141]   lock(memory_tier_lock);
[  133.750489]
[  133.750489]  *** DEADLOCK ***
[  133.750489]
[  133.756742] 6 locks held by (udev-worker)/1133:
[  133.759179]  #0: ffff888207be6158 (&dev->mutex){....}-{3:3}, at: __driver_attach+0x26c/0x570
[  133.762299]  #1: ffffffff875b5868 (device_hotplug_lock){+.+.}-{3:3}, at: lock_device_hotplug+0x20/0x30
[  133.765565]  #2: ffff88820cf6a108 (&dev->mutex){....}-{3:3}, at: device_online+0x2f/0x1d0
[  133.768978]  #3: ffffffff86d08ff0 (cpu_hotplug_lock){++++}-{0:0}, at: mem_hotplug_begin+0x17/0x30
[  133.772312]  #4: ffffffff8702dfb0 (mem_hotplug_lock){++++}-{0:0}, at: mem_hotplug_begin+0x23/0x30
[  133.775544]  #5: ffffffff875d3310 ((memory_chain).rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x60/0xb0
[  133.779113]
[  133.779113] stack backtrace:
[  133.783728] CPU: 5 UID: 0 PID: 1133 Comm: (udev-worker) Tainted: G           O     N 6.11.0-rc2+ #146
[  133.787220] Tainted: [O]=OOT_MODULE, [N]=TEST
[  133.789948] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
[  133.793291] Call Trace:
[  133.795826]  <TASK>
[  133.798284]  dump_stack_lvl+0xea/0x150
[  133.801025]  dump_stack+0x19/0x20
[  133.803609]  print_circular_bug+0x477/0x740
[  133.806341]  check_noncircular+0x2f4/0x3e0
[  133.809056]  ? __pfx_check_noncircular+0x10/0x10
[  133.811866]  ? __pfx_lockdep_lock+0x10/0x10
[  133.814670]  ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30
[  133.817610]  __lock_acquire+0x2efd/0x5c60
[  133.820339]  ? __pfx___lock_acquire+0x10/0x10
[  133.823128]  ? __dax_driver_register+0x141/0x1b0
[  133.825926]  ? do_one_initcall+0x117/0x5d0
[  133.828648]  lock_acquire+0x1ce/0x580
[  133.831349]  ? memtier_hotplug_callback+0x383/0x4b0
[  133.834293]  ? __pfx_lock_acquire+0x10/0x10
[  133.837134]  __mutex_lock+0x15c/0x1490
[  133.839829]  ? memtier_hotplug_callback+0x383/0x4b0
[  133.842753]  ? memtier_hotplug_callback+0x383/0x4b0
[  133.845602]  ? __this_cpu_preempt_check+0x21/0x30
[  133.848438]  ? __pfx___mutex_lock+0x10/0x10
[  133.851200]  ? __pfx_lock_acquire+0x10/0x10
[  133.853935]  ? global_dirty_limits+0xc0/0x160
[  133.856699]  ? __sanitizer_cov_trace_switch+0x58/0xa0
[  133.859564]  mutex_lock_nested+0x1f/0x30
[  133.862251]  ? mutex_lock_nested+0x1f/0x30
[  133.864964]  memtier_hotplug_callback+0x383/0x4b0
[  133.867752]  notifier_call_chain+0xbf/0x370
[  133.870550]  ? writeback_set_ratelimit+0xe8/0x160
[  133.873372]  blocking_notifier_call_chain+0x76/0xb0
[  133.876311]  memory_notify+0x2e/0x40
[  133.879013]  online_pages+0x597/0x720
[  133.881686]  ? irqentry_exit+0x3e/0xa0
[  133.884397]  ? __pfx_online_pages+0x10/0x10
[  133.887244]  ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30
[  133.890299]  ? mhp_init_memmap_on_memory+0x7a/0x1c0
[  133.893203]  memory_subsys_online+0x4f6/0x7f0
[  133.896099]  ? __pfx_memory_subsys_online+0x10/0x10
[  133.899039]  ? xa_load+0x16d/0x2e0
[  133.901667]  ? __pfx_xa_load+0x10/0x10
[  133.904366]  ? __pfx_memory_subsys_online+0x10/0x10
[  133.907218]  device_online+0x141/0x1d0
[  133.909845]  online_memory_block+0x4d/0x60
[  133.912494]  walk_memory_blocks+0xc0/0x120
[  133.915104]  ? __pfx_online_memory_block+0x10/0x10
[  133.917776]  add_memory_resource+0x51d/0x6c0
[  133.920404]  ? __pfx_add_memory_resource+0x10/0x10
[  133.923104]  ? _raw_write_unlock+0x31/0x60
[  133.925781]  ? register_memory_resource+0x119/0x180
[  133.928450]  add_memory_driver_managed+0xf5/0x180
[  133.931036]  dev_dax_kmem_probe+0x7f7/0xb40 [kmem]
[  133.933665]  ? __pfx_dev_dax_kmem_probe+0x10/0x10 [kmem]
[  133.936332]  ? __pfx___up_read+0x10/0x10
[  133.938878]  dax_bus_probe+0x147/0x230
[  133.941332]  ? __pfx_dax_bus_probe+0x10/0x10
[  133.943954]  really_probe+0x27f/0xac0
[  133.946387]  ? __sanitizer_cov_trace_const_cmp1+0x1e/0x30
[  133.949106]  __driver_probe_device+0x1f3/0x460
[  133.951704]  ? parse_option_str+0x149/0x190
[  133.954241]  driver_probe_device+0x56/0x1b0
[  133.956749]  __driver_attach+0x277/0x570
[  133.959228]  ? __pfx___driver_attach+0x10/0x10
[  133.961776]  bus_for_each_dev+0x145/0x1e0
[  133.964367]  ? __pfx_bus_for_each_dev+0x10/0x10
[  133.967019]  ? __kasan_check_read+0x15/0x20
[  133.969543]  ? _raw_spin_unlock+0x31/0x60
[  133.972132]  driver_attach+0x49/0x60
[  133.974536]  bus_add_driver+0x2f3/0x6b0
[  133.977044]  driver_register+0x170/0x4b0
[  133.979480]  __dax_driver_register+0x141/0x1b0
[  133.982126]  ? __pfx_dax_kmem_init+0x10/0x10 [kmem]
[  133.984724]  dax_kmem_init+0x54/0xff0 [kmem]
[  133.987284]  ? __pfx_dax_kmem_init+0x10/0x10 [kmem]
[  133.989965]  do_one_initcall+0x117/0x5d0
[  133.992506]  ? __pfx_do_one_initcall+0x10/0x10
[  133.995185]  ? __kasan_kmalloc+0x88/0xa0
[  133.997748]  ? kasan_poison+0x3e/0x60
[  134.000288]  ? kasan_unpoison+0x2c/0x60
[  134.002762]  ? kasan_poison+0x3e/0x60
[  134.005202]  ? __asan_register_globals+0x62/0x80
[  134.007753]  ? __pfx_dax_kmem_init+0x10/0x10 [kmem]
[  134.010439]  do_init_module+0x277/0x750
[  134.012953]  load_module+0x5d1d/0x74f0
[  134.015406]  ? __pfx_load_module+0x10/0x10
[  134.017887]  ? __pfx_ima_post_read_file+0x10/0x10
[  134.020470]  ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30
[  134.023127]  ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
[  134.025767]  ? security_kernel_post_read_file+0xa2/0xd0
[  134.028429]  ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
[  134.031162]  ? kernel_read_file+0x503/0x820
[  134.033645]  ? __pfx_kernel_read_file+0x10/0x10
[  134.036232]  ? __pfx___lock_acquire+0x10/0x10
[  134.038766]  init_module_from_file+0x12c/0x1a0
[  134.041291]  ? init_module_from_file+0x12c/0x1a0
[  134.043936]  ? __pfx_init_module_from_file+0x10/0x10
[  134.046516]  ? __this_cpu_preempt_check+0x21/0x30
[  134.049091]  ? __kasan_check_read+0x15/0x20
[  134.051551]  ? do_raw_spin_unlock+0x60/0x210
[  134.054077]  idempotent_init_module+0x3f1/0x690
[  134.056643]  ? __pfx_idempotent_init_module+0x10/0x10
[  134.059318]  ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
[  134.061995]  ? __fget_light+0x17d/0x210
[  134.064428]  __x64_sys_finit_module+0x10e/0x1a0
[  134.066976]  x64_sys_call+0x184d/0x20d0
[  134.069405]  do_syscall_64+0x6d/0x140
[  134.071926]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 823430c ("memory tier: consolidate the initialization of memory tiers")
Signed-off-by: Yanfei Xu <[email protected]>
Reviewed-by: "Huang, Ying" <[email protected]>
Cc: Ho-Ren (Jack) Chuang <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Sep 23, 2024
Use a dedicated mutex to guard kvm_usage_count to fix a potential deadlock
on x86 due to a chain of locks and SRCU synchronizations.  Translating the
below lockdep splat, CPU1 #6 will wait on CPU0 #1, CPU0 #8 will wait on
CPU2 #3, and CPU2 #7 will wait on CPU1 #4 (if there's a writer, due to the
fairness of r/w semaphores).

    CPU0                     CPU1                     CPU2
1   lock(&kvm->slots_lock);
2                                                     lock(&vcpu->mutex);
3                                                     lock(&kvm->srcu);
4                            lock(cpu_hotplug_lock);
5                            lock(kvm_lock);
6                            lock(&kvm->slots_lock);
7                                                     lock(cpu_hotplug_lock);
8   sync(&kvm->srcu);

Note, there are likely more potential deadlocks in KVM x86, e.g. the same
pattern of taking cpu_hotplug_lock outside of kvm_lock likely exists with
__kvmclock_cpufreq_notifier():

  cpuhp_cpufreq_online()
  |
  -> cpufreq_online()
     |
     -> cpufreq_gov_performance_limits()
        |
        -> __cpufreq_driver_target()
           |
           -> __target_index()
              |
              -> cpufreq_freq_transition_begin()
                 |
                 -> cpufreq_notify_transition()
                    |
                    -> ... __kvmclock_cpufreq_notifier()

But, actually triggering such deadlocks is beyond rare due to the
combination of dependencies and timings involved.  E.g. the cpufreq
notifier is only used on older CPUs without a constant TSC, mucking with
the NX hugepage mitigation while VMs are running is very uncommon, and
doing so while also onlining/offlining a CPU (necessary to generate
contention on cpu_hotplug_lock) would be even more unusual.

The most robust solution to the general cpu_hotplug_lock issue is likely
to switch vm_list to be an RCU-protected list, e.g. so that x86's cpufreq
notifier doesn't to take kvm_lock.  For now, settle for fixing the most
blatant deadlock, as switching to an RCU-protected list is a much more
involved change, but add a comment in locking.rst to call out that care
needs to be taken when walking holding kvm_lock and walking vm_list.

  ======================================================
  WARNING: possible circular locking dependency detected
  6.10.0-smp--c257535a0c9d-pip #330 Tainted: G S         O
  ------------------------------------------------------
  tee/35048 is trying to acquire lock:
  ff6a80eced71e0a8 (&kvm->slots_lock){+.+.}-{3:3}, at: set_nx_huge_pages+0x179/0x1e0 [kvm]

  but task is already holding lock:
  ffffffffc07abb08 (kvm_lock){+.+.}-{3:3}, at: set_nx_huge_pages+0x14a/0x1e0 [kvm]

  which lock already depends on the new lock.

   the existing dependency chain (in reverse order) is:

  -> #3 (kvm_lock){+.+.}-{3:3}:
         __mutex_lock+0x6a/0xb40
         mutex_lock_nested+0x1f/0x30
         kvm_dev_ioctl+0x4fb/0xe50 [kvm]
         __se_sys_ioctl+0x7b/0xd0
         __x64_sys_ioctl+0x21/0x30
         x64_sys_call+0x15d0/0x2e60
         do_syscall_64+0x83/0x160
         entry_SYSCALL_64_after_hwframe+0x76/0x7e

  -> #2 (cpu_hotplug_lock){++++}-{0:0}:
         cpus_read_lock+0x2e/0xb0
         static_key_slow_inc+0x16/0x30
         kvm_lapic_set_base+0x6a/0x1c0 [kvm]
         kvm_set_apic_base+0x8f/0xe0 [kvm]
         kvm_set_msr_common+0x9ae/0xf80 [kvm]
         vmx_set_msr+0xa54/0xbe0 [kvm_intel]
         __kvm_set_msr+0xb6/0x1a0 [kvm]
         kvm_arch_vcpu_ioctl+0xeca/0x10c0 [kvm]
         kvm_vcpu_ioctl+0x485/0x5b0 [kvm]
         __se_sys_ioctl+0x7b/0xd0
         __x64_sys_ioctl+0x21/0x30
         x64_sys_call+0x15d0/0x2e60
         do_syscall_64+0x83/0x160
         entry_SYSCALL_64_after_hwframe+0x76/0x7e

  -> #1 (&kvm->srcu){.+.+}-{0:0}:
         __synchronize_srcu+0x44/0x1a0
         synchronize_srcu_expedited+0x21/0x30
         kvm_swap_active_memslots+0x110/0x1c0 [kvm]
         kvm_set_memslot+0x360/0x620 [kvm]
         __kvm_set_memory_region+0x27b/0x300 [kvm]
         kvm_vm_ioctl_set_memory_region+0x43/0x60 [kvm]
         kvm_vm_ioctl+0x295/0x650 [kvm]
         __se_sys_ioctl+0x7b/0xd0
         __x64_sys_ioctl+0x21/0x30
         x64_sys_call+0x15d0/0x2e60
         do_syscall_64+0x83/0x160
         entry_SYSCALL_64_after_hwframe+0x76/0x7e

  -> #0 (&kvm->slots_lock){+.+.}-{3:3}:
         __lock_acquire+0x15ef/0x2e30
         lock_acquire+0xe0/0x260
         __mutex_lock+0x6a/0xb40
         mutex_lock_nested+0x1f/0x30
         set_nx_huge_pages+0x179/0x1e0 [kvm]
         param_attr_store+0x93/0x100
         module_attr_store+0x22/0x40
         sysfs_kf_write+0x81/0xb0
         kernfs_fop_write_iter+0x133/0x1d0
         vfs_write+0x28d/0x380
         ksys_write+0x70/0xe0
         __x64_sys_write+0x1f/0x30
         x64_sys_call+0x281b/0x2e60
         do_syscall_64+0x83/0x160
         entry_SYSCALL_64_after_hwframe+0x76/0x7e

Cc: Chao Gao <[email protected]>
Fixes: 0bf5049 ("KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock")
Cc: [email protected]
Reviewed-by: Kai Huang <[email protected]>
Acked-by: Kai Huang <[email protected]>
Tested-by: Farrah Chen <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Oct 1, 2024
…git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

v2: with kdoc fixes per Paolo Abeni.

The following patchset contains Netfilter fixes for net:

Patch #1 and #2 handle an esoteric scenario: Given two tasks sending UDP
packets to one another, two packets of the same flow in each direction
handled by different CPUs that result in two conntrack objects in NEW
state, where reply packet loses race. Then, patch #3 adds a testcase for
this scenario. Series from Florian Westphal.

1) NAT engine can falsely detect a port collision if it happens to pick
   up a reply packet as NEW rather than ESTABLISHED. Add extra code to
   detect this and suppress port reallocation in this case.

2) To complete the clash resolution in the reply direction, extend conntrack
   logic to detect clashing conntrack in the reply direction to existing entry.

3) Adds a test case.

Then, an assorted list of fixes follow:

4) Add a selftest for tproxy, from Antonio Ojea.

5) Guard ctnetlink_*_size() functions under
   #if defined(CONFIG_NETFILTER_NETLINK_GLUE_CT) || defined(CONFIG_NF_CONNTRACK_EVENTS)
   From Andy Shevchenko.

6) Use -m socket --transparent in iptables tproxy documentation.
   From XIE Zhibang.

7) Call kfree_rcu() when releasing flowtable hooks to address race with
   netlink dump path, from Phil Sutter.

8) Fix compilation warning in nf_reject with CONFIG_BRIDGE_NETFILTER=n.
   From Simon Horman.

9) Guard ctnetlink_label_size() under CONFIG_NF_CONNTRACK_EVENTS which
   is its only user, to address a compilation warning. From Simon Horman.

10) Use rcu-protected list iteration over basechain hooks from netlink
    dump path.

11) Fix memcg for nf_tables, use GFP_KERNEL_ACCOUNT is not complete.

12) Remove old nfqueue conntrack clash resolution. Instead trying to
    use same destination address consistently which requires double DNAT,
    use the existing clash resolution which allows clashing packets
    go through with different destination. Antonio Ojea originally
    reported an issue from the postrouting chain, I proposed a fix:
    https://lore.kernel.org/netfilter-devel/ZuwSwAqKgCB2a51-@calendula/T/
    which he reported it did not work for him.

13) Adds a selftest for patch 12.

14) Fixes ipvs.sh selftest.

netfilter pull request 24-09-26

* tag 'nf-24-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  selftests: netfilter: Avoid hanging ipvs.sh
  kselftest: add test for nfqueue induced conntrack race
  netfilter: nfnetlink_queue: remove old clash resolution logic
  netfilter: nf_tables: missing objects with no memcg accounting
  netfilter: nf_tables: use rcu chain hook list iterator from netlink dump path
  netfilter: ctnetlink: compile ctnetlink_label_size with CONFIG_NF_CONNTRACK_EVENTS
  netfilter: nf_reject: Fix build warning when CONFIG_BRIDGE_NETFILTER=n
  netfilter: nf_tables: Keep deleted flowtable hooks until after RCU
  docs: tproxy: ignore non-transparent sockets in iptables
  netfilter: ctnetlink: Guard possible unused functions
  selftests: netfilter: nft_tproxy.sh: add tcp tests
  selftests: netfilter: add reverse-clash resolution test case
  netfilter: conntrack: add clash resolution for reverse collisions
  netfilter: nf_nat: don't try nat source port reallocation for reverse dir clash
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Oct 7, 2024
On the node of an NFS client, some files saved in the mountpoint of the
NFS server were copied to another location of the same NFS server.
Accidentally, the nfs42_complete_copies() got a NULL-pointer dereference
crash with the following syslog:

[232064.838881] NFSv4: state recovery failed for open file nfs/pvc-12b5200d-cd0f-46a3-b9f0-af8f4fe0ef64.qcow2, error = -116
[232064.839360] NFSv4: state recovery failed for open file nfs/pvc-12b5200d-cd0f-46a3-b9f0-af8f4fe0ef64.qcow2, error = -116
[232066.588183] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000058
[232066.588586] Mem abort info:
[232066.588701]   ESR = 0x0000000096000007
[232066.588862]   EC = 0x25: DABT (current EL), IL = 32 bits
[232066.589084]   SET = 0, FnV = 0
[232066.589216]   EA = 0, S1PTW = 0
[232066.589340]   FSC = 0x07: level 3 translation fault
[232066.589559] Data abort info:
[232066.589683]   ISV = 0, ISS = 0x00000007
[232066.589842]   CM = 0, WnR = 0
[232066.589967] user pgtable: 64k pages, 48-bit VAs, pgdp=00002000956ff400
[232066.590231] [0000000000000058] pgd=08001100ae100003, p4d=08001100ae100003, pud=08001100ae100003, pmd=08001100b3c00003, pte=0000000000000000
[232066.590757] Internal error: Oops: 96000007 [#1] SMP
[232066.590958] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm vhost_net vhost vhost_iotlb tap tun ipt_rpfilter xt_multiport ip_set_hash_ip ip_set_hash_net xfrm_interface xfrm6_tunnel tunnel4 tunnel6 esp4 ah4 wireguard libcurve25519_generic veth xt_addrtype xt_set nf_conntrack_netlink ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_bitmap_port ip_set_hash_ipport dummy ip_set ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs iptable_filter sch_ingress nfnetlink_cttimeout vport_gre ip_gre ip_tunnel gre vport_geneve geneve vport_vxlan vxlan ip6_udp_tunnel udp_tunnel openvswitch nf_conncount dm_round_robin dm_service_time dm_multipath xt_nat xt_MASQUERADE nft_chain_nat nf_nat xt_mark xt_conntrack xt_comment nft_compat nft_counter nf_tables nfnetlink ocfs2 ocfs2_nodemanager ocfs2_stackglue iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_ssif nbd overlay 8021q garp mrp bonding tls rfkill sunrpc ext4 mbcache jbd2
[232066.591052]  vfat fat cas_cache cas_disk ses enclosure scsi_transport_sas sg acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler ip_tables vfio_pci vfio_pci_core vfio_virqfd vfio_iommu_type1 vfio dm_mirror dm_region_hash dm_log dm_mod nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc fuse xfs libcrc32c ast drm_vram_helper qla2xxx drm_kms_helper syscopyarea crct10dif_ce sysfillrect ghash_ce sysimgblt sha2_ce fb_sys_fops cec sha256_arm64 sha1_ce drm_ttm_helper ttm nvme_fc igb sbsa_gwdt nvme_fabrics drm nvme_core i2c_algo_bit i40e scsi_transport_fc megaraid_sas aes_neon_bs
[232066.596953] CPU: 6 PID: 4124696 Comm: 10.253.166.125- Kdump: loaded Not tainted 5.15.131-9.cl9_ocfs2.aarch64 #1
[232066.597356] Hardware name: Great Wall .\x93\x8e...RF6260 V5/GWMSSE2GL1T, BIOS T656FBE_V3.0.18 2024-01-06
[232066.597721] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[232066.598034] pc : nfs4_reclaim_open_state+0x220/0x800 [nfsv4]
[232066.598327] lr : nfs4_reclaim_open_state+0x12c/0x800 [nfsv4]
[232066.598595] sp : ffff8000f568fc70
[232066.598731] x29: ffff8000f568fc70 x28: 0000000000001000 x27: ffff21003db33000
[232066.599030] x26: ffff800005521ae0 x25: ffff0100f98fa3f0 x24: 0000000000000001
[232066.599319] x23: ffff800009920008 x22: ffff21003db33040 x21: ffff21003db33050
[232066.599628] x20: ffff410172fe9e40 x19: ffff410172fe9e00 x18: 0000000000000000
[232066.599914] x17: 0000000000000000 x16: 0000000000000004 x15: 0000000000000000
[232066.600195] x14: 0000000000000000 x13: ffff800008e685a8 x12: 00000000eac0c6e6
[232066.600498] x11: 0000000000000000 x10: 0000000000000008 x9 : ffff8000054e5828
[232066.600784] x8 : 00000000ffffffbf x7 : 0000000000000001 x6 : 000000000a9eb14a
[232066.601062] x5 : 0000000000000000 x4 : ffff70ff8a14a800 x3 : 0000000000000058
[232066.601348] x2 : 0000000000000001 x1 : 54dce46366daa6c6 x0 : 0000000000000000
[232066.601636] Call trace:
[232066.601749]  nfs4_reclaim_open_state+0x220/0x800 [nfsv4]
[232066.601998]  nfs4_do_reclaim+0x1b8/0x28c [nfsv4]
[232066.602218]  nfs4_state_manager+0x928/0x10f0 [nfsv4]
[232066.602455]  nfs4_run_state_manager+0x78/0x1b0 [nfsv4]
[232066.602690]  kthread+0x110/0x114
[232066.602830]  ret_from_fork+0x10/0x20
[232066.602985] Code: 1400000d f9403f20 f9402e61 91016003 (f9402c00)
[232066.603284] SMP: stopping secondary CPUs
[232066.606936] Starting crashdump kernel...
[232066.607146] Bye!

Analysing the vmcore, we know that nfs4_copy_state listed by destination
nfs_server->ss_copies was added by the field copies in handle_async_copy(),
and we found a waiting copy process with the stack as:
PID: 3511963  TASK: ffff710028b47e00  CPU: 0   COMMAND: "cp"
 #0 [ffff8001116ef740] __switch_to at ffff8000081b92f4
 #1 [ffff8001116ef760] __schedule at ffff800008dd0650
 #2 [ffff8001116ef7c0] schedule at ffff800008dd0a00
 #3 [ffff8001116ef7e0] schedule_timeout at ffff800008dd6aa0
 #4 [ffff8001116ef860] __wait_for_common at ffff800008dd166c
 #5 [ffff8001116ef8e0] wait_for_completion_interruptible at ffff800008dd1898
 #6 [ffff8001116ef8f0] handle_async_copy at ffff8000055142f4 [nfsv4]
 #7 [ffff8001116ef970] _nfs42_proc_copy at ffff8000055147c8 [nfsv4]
 #8 [ffff8001116efa80] nfs42_proc_copy at ffff800005514cf0 [nfsv4]
 #9 [ffff8001116efc50] __nfs4_copy_file_range.constprop.0 at ffff8000054ed694 [nfsv4]

The NULL-pointer dereference was due to nfs42_complete_copies() listed
the nfs_server->ss_copies by the field ss_copies of nfs4_copy_state.
So the nfs4_copy_state address ffff0100f98fa3f0 was offset by 0x10 and
the data accessed through this pointer was also incorrect. Generally,
the ordered list nfs4_state_owner->so_states indicate open(O_RDWR) or
open(O_WRITE) states are reclaimed firstly by nfs4_reclaim_open_state().
When destination state reclaim is failed with NFS_STATE_RECOVERY_FAILED
and copies are not deleted in nfs_server->ss_copies, the source state
may be passed to the nfs42_complete_copies() process earlier, resulting
in this crash scene finally. To solve this issue, we add a list_head
nfs_server->ss_src_copies for a server-to-server copy specially.

Fixes: 0e65a32 ("NFS: handle source server reboot")
Signed-off-by: Yanjun Zhang <[email protected]>
Reviewed-by: Trond Myklebust <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Oct 8, 2024
Tariq Toukan says:

====================
net/mlx5: hw counters refactor

This is a patchset re-post, see:
https://lore.kernel.org/[email protected]

In this patchset, Cosmin refactors hw counters and solves perf scaling
issue.

Series generated against:
commit c824deb ("cxgb4: clip_tbl: Fix spelling mistake "wont" -> "won't"")

HW counters are central to mlx5 driver operations. They are hardware
objects created and used alongside most steering operations, and queried
from a variety of places. Most counters are queried in bulk from a
periodic task in fs_counters.c.

Counter performance is important and as such, a variety of improvements
have been done over the years. Currently, counters are allocated from
pools, which are bulk allocated to amortize the cost of firmware
commands. Counters are managed through an IDR, a doubly linked list and
two atomic single linked lists. Adding/removing counters is a complex
dance between user contexts requesting it and the mlx5_fc_stats_work
task which does most of the work.

Under high load (e.g. from connection tracking flow insertion/deletion),
the counter code becomes a bottleneck, as seen on flame graphs. Whenever
a counter is deleted, it gets added to a list and the wq task is
scheduled to run immediately to actually delete it. This is done via
mod_delayed_work which uses an internal spinlock. In some tests, waiting
for this spinlock took up to 66% of all samples.

This series refactors the counter code to use a more straight-forward
approach, avoiding the mod_delayed_work problem and making the code
easier to understand. For that:

- patch #1 moves counters data structs to a more appropriate place.
- patch #2 simplifies the bulk query allocation scheme by using vmalloc.
- patch #3 replaces the IDR+3 lists with an xarray. This is the main
  patch of the series, solving the spinlock congestion issue.
- patch #4 removes an unnecessary cacheline alignment causing a lot of
  memory to be wasted.
- patches #5 and #6 are small cleanups enabled by the refactoring.
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Oct 8, 2024
Edward Cree says:

====================
sfc: per-queue stats

This series implements the netdev_stat_ops interface for per-queue
 statistics in the sfc driver, partly using existing counters that
 were originally added for ethtool -S output.

Changed in v4:
* remove RFC tags

Changed in v3:
* make TX stats count completions rather than enqueues
* add new patch #4 to account for XDP TX separately from netdev
  traffic and include it in base_stats
* move the tx_queue->old_* members out of the fastpath cachelines
* note on patch #6 that our hw_gso stats still count enqueues
* RFC since net-next is closed right now

Changed in v2:
* exclude (dedicated) XDP TXQ stats from per-queue TX stats
* explain patch #3 better
====================

Signed-off-by: David S. Miller <[email protected]>
ColinIanKing pushed a commit that referenced this pull request Oct 9, 2024
Daniel Machon says:

====================
net: sparx5: prepare for lan969x switch driver

== Description:

This series is the first of a multi-part series, that prepares and adds
support for the new lan969x switch driver.

The upstreaming efforts is split into multiple series (might change a
bit as we go along):

    1) Prepare the Sparx5 driver for lan969x (this series)
    2) Add support lan969x (same basic features as Sparx5 provides +
       RGMII, excl.  FDMA and VCAP)
    3) Add support for lan969x FDMA
    4) Add support for lan969x VCAP

== Lan969x in short:

The lan969x Ethernet switch family [1] provides a rich set of
switching features and port configurations (up to 30 ports) from 10Mbps
to 10Gbps, with support for RGMII, SGMII, QSGMII, USGMII, and USXGMII,
ideal for industrial & process automation infrastructure applications,
transport, grid automation, power substation automation, and ring &
intra-ring topologies. The LAN969x family is hardware and software
compatible and scalable supporting 46Gbps to 102Gbps switch bandwidths.

== Preparing Sparx5 for lan969x:

The lan969x switch chip reuses many of the IP's of the Sparx5 switch
chip, therefore it has been decided to add support through the existing
Sparx5 driver, in order to avoid a bunch of duplicate code. However, in
order to reuse the Sparx5 switch driver, we have to introduce some
mechanisms to handle the chip differences that are there.  These
mechanisms are:

    - Platform match data to contain all the differences that needs to
      be handled (constants, ops etc.)

    - Register macro indirection layer so that we can reuse the existing
      register macros.

    - Function for branching out on platform type where required.

In some places we ops out functions and in other places we branch on the
chip type. Exactly when we choose one over the other, is an estimate in
each case.

After this series is applied, the Sparx5 driver will be prepared for
lan969x and still function exactly as before.

== Patch breakdown:

Patch #1        adds private match data

Patch #2        adds register macro indirection layer

Patch #3-#4     does some preparation work

Patch #5-#7     adds chip constants and updates the code to use them

Patch #8-#13    adds and uses ops for handling functions differently on the
                two platforms.

Patch #14       adds and uses a macro for branching out on the chip type.

Patch #15 (NEW) redefines macros for internal ports and PGID's.

[1] https://www.microchip.com/en-us/product/lan9698

To: David S. Miller <[email protected]>
To: Eric Dumazet <[email protected]>
To: Jakub Kicinski <[email protected]>
To: Paolo Abeni <[email protected]>
To: Lars Povlsen <[email protected]>
To: Steen Hegelund <[email protected]>
To: [email protected]
To: [email protected]
To: [email protected]
To: Richard Cochran <[email protected]>
To: [email protected]
To: [email protected]
To: [email protected]
To: [email protected]
To: [email protected]
To: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]

Signed-off-by: Daniel Machon <[email protected]>
====================

Link: https://patch.msgid.link/20241004-b4-sparx5-lan969x-switch-driver-v2-0-d3290f581663@microchip.com
Signed-off-by: Paolo Abeni <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant