Relicense as SSPL #3

stuardreinhild · 2021-01-28T14:40:43Z

This is crucial to be able to host the brand new doubled-down source-open Elasticsearch on Linux.

The hard work to relicense the whole kernel has been sponsored by the legal department of Elastic. 🎉

This is crucial to be able to host the brand new doubled-down source-open Elasticsearch on Linux. Signed-off-by: Stuart Reinhild

gregkh

Forgot to put your real email address in your signed-off-by line.

stuardreinhild · 2021-01-28T15:11:21Z

I don’t use real email, I only use services of Deutsche Post AG for all my correspondence.

gregkh · 2021-01-28T15:30:12Z

Then sorry, no patches will be accepted in this manner.

…/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 5.11, take #3 - Avoid clobbering extra registers on initialisation

commit b25b0b8 upstream. With the following patches: - btrfs: backref, only collect file extent items matching backref offset - btrfs: backref, not adding refs from shared block when resolving normal backref - btrfs: backref, only search backref entries from leaves of the same root we only collect the normal data refs we want, so the imprecise upper bound total_refs of that EXTENT_ITEM could now be changed to the count of the normal backref entry we want to search. Background and how the patches fit together: Btrfs has two types of data backref. For BTRFS_EXTENT_DATA_REF_KEY type of backref, we don't have the exact block number. Therefore, we need to call resolve_indirect_refs. It uses btrfs_search_slot to locate the leaf block. Then we need to walk through the leaves to search for the EXTENT_DATA items that have disk bytenr matching the extent item (add_all_parents). When resolving indirect refs, we could take entries that don't belong to the backref entry we are searching for right now. For that reason when searching backref entry, we always use total refs of that EXTENT_ITEM rather than individual count. For example: item 11 key (40831553536 EXTENT_ITEM 4194304) itemoff 15460 itemsize extent refs 24 gen 7302 flags DATA shared data backref parent 394985472 count 10 #1 extent data backref root 257 objectid 260 offset 1048576 count 3 #2 extent data backref root 256 objectid 260 offset 65536 count 6 #3 extent data backref root 257 objectid 260 offset 65536 count 5 #4 For example, when searching backref entry #4, we'll use total_refs 24, a very loose loop ending condition, instead of total_refs = 5. But using total_refs = 24 is not accurate. Sometimes, we'll never find all the refs from specific root. As a result, the loop keeps on going until we reach the end of that inode. The first 3 patches, handle 3 different types refs we might encounter. These refs do not belong to the normal backref we are searching, and hence need to be skipped. This patch changes the total_refs to correct number so that we could end loop as soon as we find all the refs we want. btrfs send uses backref to find possible clone sources, the following is a simple test to compare the results with and without this patch: $ btrfs subvolume create /sub1 $ for i in `seq 1 163840`; do dd if=/dev/zero of=/sub1/file bs=64K count=1 seek=$((i-1)) conv=notrunc oflag=direct done $ btrfs subvolume snapshot /sub1 /sub2 $ for i in `seq 1 163840`; do dd if=/dev/zero of=/sub1/file bs=4K count=1 seek=$(((i-1)*16+10)) conv=notrunc oflag=direct done $ btrfs subvolume snapshot -r /sub1 /snap1 $ time btrfs send /snap1 | btrfs receive /volume2 Without this patch: real 69m48.124s user 0m50.199s sys 70m15.600s With this patch: real 1m59.683s user 0m35.421s sys 2m42.684s Reviewed-by: Josef Bacik <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: ethanwu <[email protected]> [ add patchset cover letter with background and numbers ] Signed-off-by: David Sterba <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

…se_mm This fix the bad fault reported by KUAP when io_wqe_worker access userspace. Bug: Read fault blocked by KUAP! WARNING: CPU: 1 PID: 101841 at arch/powerpc/mm/fault.c:229 __do_page_fault+0x6b4/0xcd0 NIP [c00000000009e7e4] __do_page_fault+0x6b4/0xcd0 LR [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 .......... Call Trace: [c000000016367330] [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 (unreliable) [c0000000163673e0] [c00000000009ee3c] do_page_fault+0x3c/0x120 [c000000016367430] [c00000000000c848] handle_page_fault+0x10/0x2c --- interrupt: 300 at iov_iter_fault_in_readable+0x148/0x6f0 .......... NIP [c0000000008e8228] iov_iter_fault_in_readable+0x148/0x6f0 LR [c0000000008e834c] iov_iter_fault_in_readable+0x26c/0x6f0 interrupt: 300 [c0000000163677e0] [c0000000007154a0] iomap_write_actor+0xc0/0x280 [c000000016367880] [c00000000070fc94] iomap_apply+0x1c4/0x780 [c000000016367990] [c000000000710330] iomap_file_buffered_write+0xa0/0x120 [c0000000163679e0] [c00800000040791c] xfs_file_buffered_aio_write+0x314/0x5e0 [xfs] [c000000016367a90] [c0000000006d74bc] io_write+0x10c/0x460 [c000000016367bb0] [c0000000006d80e4] io_issue_sqe+0x8d4/0x1200 [c000000016367c70] [c0000000006d8ad0] io_wq_submit_work+0xc0/0x250 [c000000016367cb0] [c0000000006e2578] io_worker_handle_work+0x498/0x800 [c000000016367d40] [c0000000006e2cdc] io_wqe_worker+0x3fc/0x4f0 [c000000016367da0] [c0000000001cb0a4] kthread+0x1c4/0x1d0 [c000000016367e10] [c00000000000dbf0] ret_from_kernel_thread+0x5c/0x6c The kernel consider thread AMR value for kernel thread to be AMR_KUAP_BLOCKED. Hence access to userspace is denied. This of course not correct and we should allow userspace access after kthread_use_mm(). To be precise, kthread_use_mm() should inherit the AMR value of the operating address space. But, the AMR value is thread-specific and we inherit the address space and not thread access restrictions. Because of this ignore AMR value when accessing userspace via kernel thread. current_thread_amr/iamr() are updated, because we use them in the below stack. .... [ 530.710838] CPU: 13 PID: 5587 Comm: io_wqe_worker-0 Tainted: G D 5.11.0-rc6+ #3 .... NIP [c0000000000aa0c8] pkey_access_permitted+0x28/0x90 LR [c0000000004b9278] gup_pte_range+0x188/0x420 --- interrupt: 700 [c00000001c4ef3f0] [0000000000000000] 0x0 (unreliable) [c00000001c4ef490] [c0000000004bd39c] gup_pgd_range+0x3ac/0xa20 [c00000001c4ef5a0] [c0000000004bdd44] internal_get_user_pages_fast+0x334/0x410 [c00000001c4ef620] [c000000000852028] iov_iter_get_pages+0xf8/0x5c0 [c00000001c4ef6a0] [c0000000007da44c] bio_iov_iter_get_pages+0xec/0x700 [c00000001c4ef770] [c0000000006a325c] iomap_dio_bio_actor+0x2ac/0x4f0 [c00000001c4ef810] [c00000000069cd94] iomap_apply+0x2b4/0x740 [c00000001c4ef920] [c0000000006a38b8] __iomap_dio_rw+0x238/0x5c0 [c00000001c4ef9d0] [c0000000006a3c60] iomap_dio_rw+0x20/0x80 [c00000001c4ef9f0] [c008000001927a30] xfs_file_dio_aio_write+0x1f8/0x650 [xfs] [c00000001c4efa60] [c0080000019284dc] xfs_file_write_iter+0xc4/0x130 [xfs] [c00000001c4efa90] [c000000000669984] io_write+0x104/0x4b0 [c00000001c4efbb0] [c00000000066cea4] io_issue_sqe+0x3d4/0xf50 [c00000001c4efc60] [c000000000670200] io_wq_submit_work+0xb0/0x2f0 [c00000001c4efcb0] [c000000000674268] io_worker_handle_work+0x248/0x4a0 [c00000001c4efd30] [c0000000006746e8] io_wqe_worker+0x228/0x2a0 [c00000001c4efda0] [c00000000019d994] kthread+0x1b4/0x1c0 Fixes: 48a8ab4 ("powerpc/book3s64/pkeys: Don't update SPRN_AMR when in kernel mode.") Reported-by: Zorro Lang <[email protected]> Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]

[ Upstream commit b1b8eb1 ] The previous fix left another warning in randconfig builds: WARNING: unmet direct dependencies detected for SND_SOC_QDSP6 Depends on [n]: SOUND [=y] && !UML && SND [=y] && SND_SOC [=y] && SND_SOC_QCOM [=y] && QCOM_APR [=y] && COMMON_CLK [=n] Selected by [y]: - SND_SOC_MSM8996 [=y] && SOUND [=y] && !UML && SND [=y] && SND_SOC [=y] && SND_SOC_QCOM [=y] && QCOM_APR [=y] Add one more dependency for this one. Fixes: 2bc8831 ("ASoC: qcom: fix SDM845 & QDSP6 dependencies more") Signed-off-by: Arnd Bergmann <[email protected]> Reviewed-by: Srinivas Kandagatla <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

[ Upstream commit 4a9d81c ] If the elem is deleted during be iterated on it, the iteration process will fall into an endless loop. kernel: NMI watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [nfsd:17137] PID: 17137 TASK: ffff8818d93c0000 CPU: 4 COMMAND: "nfsd" [exception RIP: __state_in_grace+76] RIP: ffffffffc00e817c RSP: ffff8818d3aefc98 RFLAGS: 00000246 RAX: ffff881dc0c38298 RBX: ffffffff81b03580 RCX: ffff881dc02c9f50 RDX: ffff881e3fce8500 RSI: 0000000000000001 RDI: ffffffff81b03580 RBP: ffff8818d3aefca0 R8: 0000000000000020 R9: ffff8818d3aefd40 R10: ffff88017fc03800 R11: ffff8818e83933c0 R12: ffff8818d3aefd40 R13: 0000000000000000 R14: ffff8818e8391068 R15: ffff8818fa6e4000 CS: 0010 SS: 0018 #0 [ffff8818d3aefc98] opens_in_grace at ffffffffc00e81e3 [grace] gregkh#1 [ffff8818d3aefca8] nfs4_preprocess_stateid_op at ffffffffc02a3e6c [nfsd] gregkh#2 [ffff8818d3aefd18] nfsd4_write at ffffffffc028ed5b [nfsd] gregkh#3 [ffff8818d3aefd80] nfsd4_proc_compound at ffffffffc0290a0d [nfsd] gregkh#4 [ffff8818d3aefdd0] nfsd_dispatch at ffffffffc027b800 [nfsd] gregkh#5 [ffff8818d3aefe08] svc_process_common at ffffffffc02017f3 [sunrpc] gregkh#6 [ffff8818d3aefe70] svc_process at ffffffffc0201ce3 [sunrpc] gregkh#7 [ffff8818d3aefe98] nfsd at ffffffffc027b117 [nfsd] gregkh#8 [ffff8818d3aefec8] kthread at ffffffff810b88c1 gregkh#9 [ffff8818d3aeff50] ret_from_fork at ffffffff816d1607 The troublemake elem: crash> lock_manager ffff881dc0c38298 struct lock_manager { list = { next = 0xffff881dc0c38298, prev = 0xffff881dc0c38298 }, block_opens = false } Fixes: c87fb4a ("lockd: NLM grace period shouldn't block NFSv4 opens") Signed-off-by: Cheng Lin <[email protected]> Signed-off-by: Yi Wang <[email protected]> Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

[ Upstream commit d9e4498 ] Like other tunneling interfaces, the bareudp doesn't need TXLOCK. So, It is good to set the NETIF_F_LLTX flag to improve performance and to avoid lockdep's false-positive warning. Test commands: ip netns add A ip netns add B ip link add veth0 netns A type veth peer name veth1 netns B ip netns exec A ip link set veth0 up ip netns exec A ip a a 10.0.0.1/24 dev veth0 ip netns exec B ip link set veth1 up ip netns exec B ip a a 10.0.0.2/24 dev veth1 for i in {2..1} do let A=$i-1 ip netns exec A ip link add bareudp$i type bareudp \ dstport $i ethertype ip ip netns exec A ip link set bareudp$i up ip netns exec A ip a a 10.0.$i.1/24 dev bareudp$i ip netns exec A ip r a 10.0.$i.2 encap ip src 10.0.$A.1 \ dst 10.0.$A.2 via 10.0.$i.2 dev bareudp$i ip netns exec B ip link add bareudp$i type bareudp \ dstport $i ethertype ip ip netns exec B ip link set bareudp$i up ip netns exec B ip a a 10.0.$i.2/24 dev bareudp$i ip netns exec B ip r a 10.0.$i.1 encap ip src 10.0.$A.2 \ dst 10.0.$A.1 via 10.0.$i.1 dev bareudp$i done ip netns exec A ping 10.0.2.2 Splat looks like: [ 96.992803][ T822] ============================================ [ 96.993954][ T822] WARNING: possible recursive locking detected [ 96.995102][ T822] 5.10.0+ #819 Not tainted [ 96.995927][ T822] -------------------------------------------- [ 96.997091][ T822] ping/822 is trying to acquire lock: [ 96.998083][ T822] ffff88810f753898 (_xmit_NONE#2){+.-.}-{2:2}, at: __dev_queue_xmit+0x1f52/0x2960 [ 96.999813][ T822] [ 96.999813][ T822] but task is already holding lock: [ 97.001192][ T822] ffff88810c385498 (_xmit_NONE#2){+.-.}-{2:2}, at: __dev_queue_xmit+0x1f52/0x2960 [ 97.002908][ T822] [ 97.002908][ T822] other info that might help us debug this: [ 97.004401][ T822] Possible unsafe locking scenario: [ 97.004401][ T822] [ 97.005784][ T822] CPU0 [ 97.006407][ T822] ---- [ 97.007010][ T822] lock(_xmit_NONE#2); [ 97.007779][ T822] lock(_xmit_NONE#2); [ 97.008550][ T822] [ 97.008550][ T822] *** DEADLOCK *** [ 97.008550][ T822] [ 97.010057][ T822] May be due to missing lock nesting notation [ 97.010057][ T822] [ 97.011594][ T822] 7 locks held by ping/822: [ 97.012426][ T822] #0: ffff888109a144f0 (sk_lock-AF_INET){+.+.}-{0:0}, at: raw_sendmsg+0x12f7/0x2b00 [ 97.014191][ T822] gregkh#1: ffffffffbce2f5a0 (rcu_read_lock_bh){....}-{1:2}, at: ip_finish_output2+0x249/0x2020 [ 97.016045][ T822] gregkh#2: ffffffffbce2f5a0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x1fd/0x2960 [ 97.017897][ T822] gregkh#3: ffff88810c385498 (_xmit_NONE#2){+.-.}-{2:2}, at: __dev_queue_xmit+0x1f52/0x2960 [ 97.019684][ T822] gregkh#4: ffffffffbce2f600 (rcu_read_lock){....}-{1:2}, at: bareudp_xmit+0x31b/0x3690 [bareudp] [ 97.021573][ T822] gregkh#5: ffffffffbce2f5a0 (rcu_read_lock_bh){....}-{1:2}, at: ip_finish_output2+0x249/0x2020 [ 97.023424][ T822] gregkh#6: ffffffffbce2f5a0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x1fd/0x2960 [ 97.025259][ T822] [ 97.025259][ T822] stack backtrace: [ 97.026349][ T822] CPU: 3 PID: 822 Comm: ping Not tainted 5.10.0+ #819 [ 97.027609][ T822] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 97.029407][ T822] Call Trace: [ 97.030015][ T822] dump_stack+0x99/0xcb [ 97.030783][ T822] __lock_acquire.cold.77+0x149/0x3a9 [ 97.031773][ T822] ? stack_trace_save+0x81/0xa0 [ 97.032661][ T822] ? register_lock_class+0x1910/0x1910 [ 97.033673][ T822] ? register_lock_class+0x1910/0x1910 [ 97.034679][ T822] ? rcu_read_lock_sched_held+0x91/0xc0 [ 97.035697][ T822] ? rcu_read_lock_bh_held+0xa0/0xa0 [ 97.036690][ T822] lock_acquire+0x1b2/0x730 [ 97.037515][ T822] ? __dev_queue_xmit+0x1f52/0x2960 [ 97.038466][ T822] ? check_flags+0x50/0x50 [ 97.039277][ T822] ? netif_skb_features+0x296/0x9c0 [ 97.040226][ T822] ? validate_xmit_skb+0x29/0xb10 [ 97.041151][ T822] _raw_spin_lock+0x30/0x70 [ 97.041977][ T822] ? __dev_queue_xmit+0x1f52/0x2960 [ 97.042927][ T822] __dev_queue_xmit+0x1f52/0x2960 [ 97.043852][ T822] ? netdev_core_pick_tx+0x290/0x290 [ 97.044824][ T822] ? mark_held_locks+0xb7/0x120 [ 97.045712][ T822] ? lockdep_hardirqs_on_prepare+0x12c/0x3e0 [ 97.046824][ T822] ? __local_bh_enable_ip+0xa5/0xf0 [ 97.047771][ T822] ? ___neigh_create+0x12a8/0x1eb0 [ 97.048710][ T822] ? trace_hardirqs_on+0x41/0x120 [ 97.049626][ T822] ? ___neigh_create+0x12a8/0x1eb0 [ 97.050556][ T822] ? __local_bh_enable_ip+0xa5/0xf0 [ 97.051509][ T822] ? ___neigh_create+0x12a8/0x1eb0 [ 97.052443][ T822] ? check_chain_key+0x244/0x5f0 [ 97.053352][ T822] ? rcu_read_lock_bh_held+0x56/0xa0 [ 97.054317][ T822] ? ip_finish_output2+0x6ea/0x2020 [ 97.055263][ T822] ? pneigh_lookup+0x410/0x410 [ 97.056135][ T822] ip_finish_output2+0x6ea/0x2020 [ ... ] Acked-by: Guillaume Nault <[email protected]> Fixes: 571912c ("net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc.") Signed-off-by: Taehee Yoo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 3a21777 upstream. We had kernel panic, it is caused by unload module and last close confirmation. call trace: [1196029.743127] free_sess+0x15/0x50 [rtrs_client] [1196029.743128] rtrs_clt_close+0x4c/0x70 [rtrs_client] [1196029.743129] ? rnbd_clt_unmap_device+0x1b0/0x1b0 [rnbd_client] [1196029.743130] close_rtrs+0x25/0x50 [rnbd_client] [1196029.743131] rnbd_client_exit+0x93/0xb99 [rnbd_client] [1196029.743132] __x64_sys_delete_module+0x190/0x260 And in the crashdump confirmation kworker is also running. PID: 6943 TASK: ffff9e2ac8098000 CPU: 4 COMMAND: "kworker/4:2" #0 [ffffb206cf337c30] __schedule at ffffffff9f93f891 gregkh#1 [ffffb206cf337cc8] schedule at ffffffff9f93fe98 gregkh#2 [ffffb206cf337cd0] schedule_timeout at ffffffff9f943938 gregkh#3 [ffffb206cf337d50] wait_for_completion at ffffffff9f9410a7 gregkh#4 [ffffb206cf337da0] __flush_work at ffffffff9f08ce0e gregkh#5 [ffffb206cf337e20] rtrs_clt_close_conns at ffffffffc0d5f668 [rtrs_client] gregkh#6 [ffffb206cf337e48] rtrs_clt_close at ffffffffc0d5f801 [rtrs_client] gregkh#7 [ffffb206cf337e68] close_rtrs at ffffffffc0d26255 [rnbd_client] gregkh#8 [ffffb206cf337e78] free_sess at ffffffffc0d262ad [rnbd_client] gregkh#9 [ffffb206cf337e88] rnbd_clt_put_dev at ffffffffc0d266a7 [rnbd_client] The problem is both code path try to close same session, which lead to panic. To fix it, just skip the sess if the refcount already drop to 0. Fixes: f7a7a5c ("block/rnbd: client: main functionality") Signed-off-by: Jack Wang <[email protected]> Reviewed-by: Gioh Kim <[email protected]> Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

[ Upstream commit c312446 ] The mpath disk node takes a reference on the request mpath request queue when adding live path to the mpath gendisk. However if we connected to an inaccessible path device_add_disk is not called, so if we disconnect and remove the mpath gendisk we endup putting an reference on the request queue that was never taken [1]. Fix that to check if we ever added a live path (using NVME_NS_HEAD_HAS_DISK flag) and if not, clear the disk->queue reference. [1]: ------------[ cut here ]------------ refcount_t: underflow; use-after-free. WARNING: CPU: 1 PID: 1372 at lib/refcount.c:28 refcount_warn_saturate+0xa6/0xf0 CPU: 1 PID: 1372 Comm: nvme Tainted: G O 5.7.0-rc2+ gregkh#3 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1 04/01/2014 RIP: 0010:refcount_warn_saturate+0xa6/0xf0 RSP: 0018:ffffb29e8053bdc0 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff8b7a2f4fc060 RCX: 0000000000000007 RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff8b7a3ec99980 RBP: ffff8b7a2f4fc000 R08: 00000000000002e1 R09: 0000000000000004 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 R13: fffffffffffffff2 R14: ffffb29e8053bf08 R15: ffff8b7a320e2da0 FS: 00007f135d4ca800(0000) GS:ffff8b7a3ec80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005651178c0c30 CR3: 000000003b650005 CR4: 0000000000360ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: disk_release+0xa2/0xc0 device_release+0x28/0x80 kobject_put+0xa5/0x1b0 nvme_put_ns_head+0x26/0x70 [nvme_core] nvme_put_ns+0x30/0x60 [nvme_core] nvme_remove_namespaces+0x9b/0xe0 [nvme_core] nvme_do_delete_ctrl+0x43/0x5c [nvme_core] nvme_sysfs_delete.cold+0x8/0xd [nvme_core] kernfs_fop_write+0xc1/0x1a0 vfs_write+0xb6/0x1a0 ksys_write+0x5f/0xe0 do_syscall_64+0x52/0x1a0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Reported-by: Anton Eidelman <[email protected]> Tested-by: Anton Eidelman <[email protected]> Signed-off-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

commit 311eab8 upstream. devm_gpiod_get_index() doesn't return NULL but -ENOENT when the requested GPIO doesn't exist, leading to the following messages: [ 2.742468] gpiod_direction_input: invalid GPIO (errorpointer) [ 2.748147] can't set direction for gpio gregkh#2: -2 [ 2.753081] gpiod_direction_input: invalid GPIO (errorpointer) [ 2.758724] can't set direction for gpio gregkh#3: -2 [ 2.763666] gpiod_direction_output: invalid GPIO (errorpointer) [ 2.769394] can't set direction for gpio gregkh#4: -2 [ 2.774341] gpiod_direction_input: invalid GPIO (errorpointer) [ 2.779981] can't set direction for gpio gregkh#5: -2 [ 2.784545] ff000a20.serial: ttyCPM1 at MMIO 0xfff00a20 (irq = 39, base_baud = 8250000) is a CPM UART Use devm_gpiod_get_index_optional() instead. At the same time, handle the error case and properly exit with an error. Fixes: 97cbaf2 ("tty: serial: cpm_uart: Convert to use GPIO descriptors") Cc: [email protected] Cc: Linus Walleij <[email protected]> Signed-off-by: Christophe Leroy <[email protected]> Reviewed-by: Linus Walleij <[email protected]> Link: https://lore.kernel.org/r/694a25fdce548c5ee8b060ef6a4b02746b8f25c0.1591986307.git.christophe.leroy@csgroup.eu Signed-off-by: Greg Kroah-Hartman <[email protected]>

[ Upstream commit c4c0bdc ] Currently, if a bpf program has more than one subprograms, each program will be jitted separately. For programs with bpf-to-bpf calls the prog->aux->num_exentries is not setup properly. For example, with bpf_iter_netlink.c modified to force one function to be not inlined and with CONFIG_BPF_JIT_ALWAYS_ON the following error is seen: $ ./test_progs -n 3/3 ... libbpf: failed to load program 'iter/netlink' libbpf: failed to load object 'bpf_iter_netlink' libbpf: failed to load BPF skeleton 'bpf_iter_netlink': -4007 test_netlink:FAIL:bpf_iter_netlink__open_and_load skeleton open_and_load failed gregkh#3/3 netlink:FAIL The dmesg shows the following errors: ex gen bug which is triggered by the following code in arch/x86/net/bpf_jit_comp.c: if (excnt >= bpf_prog->aux->num_exentries) { pr_err("ex gen bug\n"); return -EFAULT; } This patch fixes the issue by computing proper num_exentries for each subprogram before calling JIT. Signed-off-by: Yonghong Song <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

commit 0bd0db4 upstream. The `INSN_CONFIG` comedi instruction with sub-instruction code `INSN_CONFIG_DIGITAL_TRIG` includes a base channel in `data[3]`. This is used as a right shift amount for other bitmask values without being checked. Shift amounts greater than or equal to 32 will result in undefined behavior. Add code to deal with this. Fixes: 33cdce6 ("staging: comedi: addi_apci_1032: conform to new INSN_CONFIG_DIGITAL_TRIG") Cc: <[email protected]> gregkh#3.8+ Signed-off-by: Ian Abbott <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 926234f upstream. The `INSN_CONFIG` comedi instruction with sub-instruction code `INSN_CONFIG_DIGITAL_TRIG` includes a base channel in `data[3]`. This is used as a right shift amount for other bitmask values without being checked. Shift amounts greater than or equal to 32 will result in undefined behavior. Add code to deal with this. Fixes: 1e15687 ("staging: comedi: addi_apci_1564: add Change-of-State interrupt subdevice and required functions") Cc: <[email protected]> gregkh#3.17+ Signed-off-by: Ian Abbott <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit be6577a upstream. Stalls are quite frequent with recent kernels. I enabled CONFIG_SOFTLOCKUP_DETECTOR and I caught the following stall: watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [cc1:22803] CPU: 0 PID: 22803 Comm: cc1 Not tainted 5.6.17+ gregkh#3 Hardware name: 9000/800/rp3440 IAOQ[0]: d_alloc_parallel+0x384/0x688 IAOQ[1]: d_alloc_parallel+0x388/0x688 RP(r2): d_alloc_parallel+0x134/0x688 Backtrace: [<000000004036974c>] __lookup_slow+0xa4/0x200 [<0000000040369fc8>] walk_component+0x288/0x458 [<000000004036a9a0>] path_lookupat+0x88/0x198 [<000000004036e748>] filename_lookup+0xa0/0x168 [<000000004036e95c>] user_path_at_empty+0x64/0x80 [<000000004035d93c>] vfs_statx+0x104/0x158 [<000000004035dfcc>] __do_sys_lstat64+0x44/0x80 [<000000004035e5a0>] sys_lstat64+0x20/0x38 [<0000000040180054>] syscall_exit+0x0/0x14 The code was stuck in this loop in d_alloc_parallel: 4037d414: 0e 00 10 dc ldd 0(r16),ret0 4037d418: c7 fc 5f ed bb,< ret0,1f,4037d414 <d_alloc_parallel+0x384> 4037d41c: 08 00 02 40 nop This is the inner loop of bit_spin_lock which is called by hlist_bl_unlock in d_alloc_parallel: static inline void bit_spin_lock(int bitnum, unsigned long *addr) { /* * Assuming the lock is uncontended, this never enters * the body of the outer loop. If it is contended, then * within the inner loop a non-atomic test is used to * busywait with less bus contention for a good time to * attempt to acquire the lock bit. */ preempt_disable(); #if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) while (unlikely(test_and_set_bit_lock(bitnum, addr))) { preempt_enable(); do { cpu_relax(); } while (test_bit(bitnum, addr)); preempt_disable(); } #endif __acquire(bitlock); } After consideration, I realized that we must be losing bit unlocks. Then, I noticed that we missed defining atomic64_set_release(). Adding this define fixes the stalls in bit operations. Signed-off-by: Dave Anglin <[email protected]> Cc: [email protected] Signed-off-by: Helge Deller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

Add 3 new tests for tag-based KASAN modes: 1. Check that match-all pointer tag is not assigned randomly. 2. Check that 0xff works as a match-all pointer tag. 3. Check that there are no match-all memory tags. Note, that test #3 causes a significant number (255) of KASAN reports to be printed during execution for the SW_TAGS mode. [[email protected]: export kasan_poison] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: s/EXPORT_SYMBOL_GPL/EXPORT_SYMBOL/, per Andrey] Link: https://linux-review.googlesource.com/id/I78f1375efafa162b37f3abcb2c5bc2f3955dfd8e Link: https://lkml.kernel.org/r/da841a5408e2204bf25f3b23f70540a65844e8a4.1610733117.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]> Reviewed-by: Marco Elver <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Branislav Rankov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Kevin Brodsky <[email protected]> Cc: Peter Collingbourne <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>

[ Upstream commit c5c97ca ] The ubsan reported the following error. It was because sample's raw data missed u32 padding at the end. So it broke the alignment of the array after it. The raw data contains an u32 size prefix so the data size should have an u32 padding after 8-byte aligned data. 27: Sample parsing :util/synthetic-events.c:1539:4: runtime error: store to misaligned address 0x62100006b9bc for type '__u64' (aka 'unsigned long long'), which requires 8 byte alignment 0x62100006b9bc: note: pointer points here 00 00 00 00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ #0 0x561532a9fc96 in perf_event__synthesize_sample util/synthetic-events.c:1539:13 #1 0x5615327f4a4f in do_test tests/sample-parsing.c:284:8 #2 0x5615327f3f50 in test__sample_parsing tests/sample-parsing.c:381:9 #3 0x56153279d3a1 in run_test tests/builtin-test.c:424:9 #4 0x56153279c836 in test_and_print tests/builtin-test.c:454:9 #5 0x56153279b7eb in __cmd_test tests/builtin-test.c:675:4 gregkh#6 0x56153279abf0 in cmd_test tests/builtin-test.c:821:9 gregkh#7 0x56153264e796 in run_builtin perf.c:312:11 gregkh#8 0x56153264cf03 in handle_internal_command perf.c:364:8 gregkh#9 0x56153264e47d in run_argv perf.c:408:2 gregkh#10 0x56153264c9a9 in main perf.c:538:3 gregkh#11 0x7f137ab6fbbc in __libc_start_main (/lib64/libc.so.6+0x38bbc) gregkh#12 0x561532596828 in _start ... SUMMARY: UndefinedBehaviorSanitizer: misaligned-pointer-use util/synthetic-events.c:1539:4 in Fixes: 045f8cd ("perf tests: Add a sample parsing test") Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

[ Upstream commit c5c97ca ] The ubsan reported the following error. It was because sample's raw data missed u32 padding at the end. So it broke the alignment of the array after it. The raw data contains an u32 size prefix so the data size should have an u32 padding after 8-byte aligned data. 27: Sample parsing :util/synthetic-events.c:1539:4: runtime error: store to misaligned address 0x62100006b9bc for type '__u64' (aka 'unsigned long long'), which requires 8 byte alignment 0x62100006b9bc: note: pointer points here 00 00 00 00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ #0 0x561532a9fc96 in perf_event__synthesize_sample util/synthetic-events.c:1539:13 #1 0x5615327f4a4f in do_test tests/sample-parsing.c:284:8 #2 0x5615327f3f50 in test__sample_parsing tests/sample-parsing.c:381:9 #3 0x56153279d3a1 in run_test tests/builtin-test.c:424:9 #4 0x56153279c836 in test_and_print tests/builtin-test.c:454:9 #5 0x56153279b7eb in __cmd_test tests/builtin-test.c:675:4 #6 0x56153279abf0 in cmd_test tests/builtin-test.c:821:9 #7 0x56153264e796 in run_builtin perf.c:312:11 #8 0x56153264cf03 in handle_internal_command perf.c:364:8 #9 0x56153264e47d in run_argv perf.c:408:2 #10 0x56153264c9a9 in main perf.c:538:3 #11 0x7f137ab6fbbc in __libc_start_main (/lib64/libc.so.6+0x38bbc) #12 0x561532596828 in _start ... SUMMARY: UndefinedBehaviorSanitizer: misaligned-pointer-use util/synthetic-events.c:1539:4 in Fixes: 045f8cd ("perf tests: Add a sample parsing test") Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

Calling btrfs_qgroup_reserve_meta_prealloc from btrfs_delayed_inode_reserve_metadata can result in flushing delalloc while holding a transaction and delayed node locks. This is deadlock prone. In the past multiple commits: * ae5e070 ("btrfs: qgroup: don't try to wait flushing if we're already holding a transaction") * 6f23277 ("btrfs: qgroup: don't commit transaction when we already hold the handle") Tried to solve various aspects of this but this was always a whack-a-mole game. Unfortunately those 2 fixes don't solve a deadlock scenario involving btrfs_delayed_node::mutex. Namely, one thread can call btrfs_dirty_inode as a result of reading a file and modifying its atime: PID: 6963 TASK: ffff8c7f3f94c000 CPU: 2 COMMAND: "test" #0 __schedule at ffffffffa529e07d gregkh#1 schedule at ffffffffa529e4ff gregkh#2 schedule_timeout at ffffffffa52a1bdd gregkh#3 wait_for_completion at ffffffffa529eeea <-- sleeps with delayed node mutex held gregkh#4 start_delalloc_inodes at ffffffffc0380db5 gregkh#5 btrfs_start_delalloc_snapshot at ffffffffc0393836 gregkh#6 try_flush_qgroup at ffffffffc03f04b2 gregkh#7 __btrfs_qgroup_reserve_meta at ffffffffc03f5bb6 <-- tries to reserve space and starts delalloc inodes. gregkh#8 btrfs_delayed_update_inode at ffffffffc03e31aa <-- acquires delayed node mutex gregkh#9 btrfs_update_inode at ffffffffc0385ba8 gregkh#10 btrfs_dirty_inode at ffffffffc038627b <-- TRANSACTIION OPENED gregkh#11 touch_atime at ffffffffa4cf0000 gregkh#12 generic_file_read_iter at ffffffffa4c1f123 gregkh#13 new_sync_read at ffffffffa4ccdc8a gregkh#14 vfs_read at ffffffffa4cd0849 gregkh#15 ksys_read at ffffffffa4cd0bd1 gregkh#16 do_syscall_64 at ffffffffa4a052eb gregkh#17 entry_SYSCALL_64_after_hwframe at ffffffffa540008c This will cause an asynchronous work to flush the delalloc inodes to happen which can try to acquire the same delayed_node mutex: PID: 455 TASK: ffff8c8085fa4000 CPU: 5 COMMAND: "kworker/u16:30" #0 __schedule at ffffffffa529e07d gregkh#1 schedule at ffffffffa529e4ff gregkh#2 schedule_preempt_disabled at ffffffffa529e80a gregkh#3 __mutex_lock at ffffffffa529fdcb <-- goes to sleep, never wakes up. gregkh#4 btrfs_delayed_update_inode at ffffffffc03e3143 <-- tries to acquire the mutex gregkh#5 btrfs_update_inode at ffffffffc0385ba8 <-- this is the same inode that pid 6963 is holding gregkh#6 cow_file_range_inline.constprop.78 at ffffffffc0386be7 gregkh#7 cow_file_range at ffffffffc03879c1 gregkh#8 btrfs_run_delalloc_range at ffffffffc038894c gregkh#9 writepage_delalloc at ffffffffc03a3c8f gregkh#10 __extent_writepage at ffffffffc03a4c01 gregkh#11 extent_write_cache_pages at ffffffffc03a500b gregkh#12 extent_writepages at ffffffffc03a6de2 gregkh#13 do_writepages at ffffffffa4c277eb gregkh#14 __filemap_fdatawrite_range at ffffffffa4c1e5bb gregkh#15 btrfs_run_delalloc_work at ffffffffc0380987 <-- starts running delayed nodes gregkh#16 normal_work_helper at ffffffffc03b706c gregkh#17 process_one_work at ffffffffa4aba4e4 gregkh#18 worker_thread at ffffffffa4aba6fd gregkh#19 kthread at ffffffffa4ac0a3d gregkh#20 ret_from_fork at ffffffffa54001ff To fully address those cases the complete fix is to never issue any flushing while holding the transaction or the delayed node lock. This patch achieves it by calling qgroup_reserve_meta directly which will either succeed without flushing or will fail and return -EDQUOT. In the latter case that return value is going to be propagated to btrfs_dirty_inode which will fallback to start a new transaction. That's fine as the majority of time we expect the inode will have BTRFS_DELAYED_NODE_INODE_DIRTY flag set which will result in directly copying the in-memory state. Fixes: c53e965 ("btrfs: qgroup: try to flush qgroup space when we get -EDQUOT") CC: [email protected] # 5.10+ Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: David Sterba <[email protected]>

[ Upstream commit c855af2 ] Fix the problem of dev being released twice. ------------[ cut here ]------------ refcount_t: underflow; use-after-free. WARNING: CPU: 75 PID: 15700 at lib/refcount.c:28 refcount_warn_saturate+0xd4/0x150 CPU: 75 PID: 15700 Comm: rmmod Tainted: G E 5.10.0-rc3+ #3 Hardware name: Huawei TaiShan 200 (Model 2280)/BC82AMDDA, BIOS 0.88 07/24/2019 pstate: 40400009 (nZcv daif +PAN -UAO -TCO BTYPE=--) pc : refcount_warn_saturate+0xd4/0x150 lr : refcount_warn_saturate+0xd4/0x150 sp : ffff2028150cbc00 x29: ffff2028150cbc00 x28: ffff2028150121c0 x27: 0000000000000000 x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000003 x23: 0000000000000000 x22: ffff2028150cbc90 x21: ffff2020038a30a8 x20: ffff2028150cbc90 x19: ffff0020cd938020 x18: 0000000000000010 x17: 0000000000000000 x16: 0000000000000000 x15: ffffffffffffffff x14: ffff2028950cb88f x13: ffff2028150cb89d x12: 0000000000000000 x11: 0000000005f5e0ff x10: ffff2028150cb800 x9 : 00000000ffffffd0 x8 : 75203b776f6c6672 x7 : ffff800011a6f7c8 x6 : 0000000000000001 x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000 x2 : ffff202ffe2f9dc0 x1 : ffffa02fecf40000 x0 : 0000000000000026 Call trace: refcount_warn_saturate+0xd4/0x150 devm_drm_dev_init_release+0x50/0x70 devm_action_release+0x20/0x30 release_nodes+0x13c/0x218 devres_release_all+0x80/0x170 device_release_driver_internal+0x128/0x1f0 driver_detach+0x6c/0xe0 bus_remove_driver+0x74/0x100 driver_unregister+0x34/0x60 pci_unregister_driver+0x24/0xd8 hibmc_pci_driver_exit+0x14/0xe858 [hibmc_drm] __arm64_sys_delete_module+0x1fc/0x2d0 el0_svc_common.constprop.3+0xa8/0x188 do_el0_svc+0x80/0xa0 el0_sync_handler+0x8c/0xb0 el0_sync+0x15c/0x180 CPU: 75 PID: 15700 Comm: rmmod Tainted: G E 5.10.0-rc3+ #3 Hardware name: Huawei TaiShan 200 (Model 2280)/BC82AMDDA, BIOS 0.88 07/24/2019 Call trace: dump_backtrace+0x0/0x208 show_stack+0x2c/0x40 dump_stack+0xd8/0x10c __warn+0xac/0x128 report_bug+0xcc/0x180 bug_handler+0x24/0x78 call_break_hook+0x80/0xa0 brk_handler+0x28/0x68 do_debug_exception+0x9c/0x148 el1_sync_handler+0x7c/0x128 el1_sync+0x80/0x100 refcount_warn_saturate+0xd4/0x150 devm_drm_dev_init_release+0x50/0x70 devm_action_release+0x20/0x30 release_nodes+0x13c/0x218 devres_release_all+0x80/0x170 device_release_driver_internal+0x128/0x1f0 driver_detach+0x6c/0xe0 bus_remove_driver+0x74/0x100 driver_unregister+0x34/0x60 pci_unregister_driver+0x24/0xd8 hibmc_pci_driver_exit+0x14/0xe858 [hibmc_drm] __arm64_sys_delete_module+0x1fc/0x2d0 el0_svc_common.constprop.3+0xa8/0x188 do_el0_svc+0x80/0xa0 el0_sync_handler+0x8c/0xb0 el0_sync+0x15c/0x180 ---[ end trace 00718630d6e5ff18 ]--- Signed-off-by: Tian Tao <[email protected]> Acked-by: Thomas Zimmermann <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Sasha Levin <[email protected]>

Leon Hwang says: ==================== In the discussion thread "[PATCH bpf-next v9 0/7] bpf: Introduce BPF_F_CPU and BPF_F_ALL_CPUS flags for percpu maps"[1], it was pointed out that missing calls to bpf_obj_free_fields() could lead to memory leaks. A selftest was added to confirm that this is indeed a real issue - the refcount of BPF_KPTR_REF field is not decremented when bpf_obj_free_fields() is missing after copy_map_value[,_long](). Further inspection of copy_map_value[,_long]() call sites revealed two locations affected by this issue: 1. pcpu_copy_value() 2. htab_map_update_elem() when used with BPF_F_LOCK Similar case happens when update local storage maps with BPF_F_LOCK. This series fixes the cases where BPF_F_LOCK is not involved by properly calling bpf_obj_free_fields() after copy_map_value[,_long](), and adds a selftest to verify the fix. The remaining cases involving BPF_F_LOCK will be addressed in a separate patch set after the series "bpf: Introduce BPF_F_CPU and BPF_F_ALL_CPUS flags for percpu maps" is applied. Changes: v5 -> v6: * Update the test name to include "refcounted_kptr". * Update some local variables' name in the test (per Alexei). * v5: https://lore.kernel.org/bpf/[email protected]/ v4 -> v5: * Use a local variable to store the this_cpu_ptr()/per_cpu_ptr() result, and reuse it between copy_map_value[,_long]() and bpf_obj_free_fields() in patch gregkh#1 (per Andrii). * Drop patch gregkh#2 and gregkh#3, because the combination of BPF_F_LOCK with other special fields (except for BPF_SPIN_LOCK) will be disallowed on the UAPI side in the future (per Alexei). * v4: https://lore.kernel.org/bpf/[email protected]/ v3 -> v4: * Target bpf-next tree. * Address comments from Amery: * Drop 'bpf_obj_free_fields()' in the path of updating local storage maps without BPF_F_LOCK. * Drop the corresponding self test. * Respin the other test of local storage maps using syscall BPF programs. * v3: https://lore.kernel.org/bpf/[email protected]/ v2 -> v3: * Free special fields when update local storage maps without BPF_F_LOCK. * Add test to verify decrementing refcount when update cgroup local storage maps without BPF_F_LOCK. * Address review from AI bot: * Slow path with BPF_F_LOCK (around line 642-646) in 'bpf_local_storage.c'. * v2: https://lore.kernel.org/bpf/[email protected]/ v1 -> v2: * Add test to verify decrementing refcount when update cgroup local storage maps with BPF_F_LOCK. * Address review from AI bot: * Fast path without bucket lock (around line 610) in 'bpf_local_storage.c'. * v1: https://lore.kernel.org/bpf/[email protected]/ ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>

Ido Schimmel says: ==================== icmp: Add RFC 5837 support tl;dr ===== This patchset extends certain ICMP error messages (e.g., "Time Exceeded") with incoming interface information in accordance with RFC 5837 [1]. This is required for more meaningful traceroute results in unnumbered networks. Like other ICMP settings, the feature is controlled via a per-{netns, address family} sysctl. The interface and the implementation are designed to support more ICMP extensions. Motivation ========== Over the years, the kernel was extended with the ability to derive the source IP of ICMP error messages from the interface that received the datagram which elicited the ICMP error [2][3][4]. This is especially important for "Time Exceeded" messages as it allows traceroute users to trace the actual packet path along the network. The above scheme does not work in unnumbered networks. In these networks, only the loopback / VRF interface is assigned a global IP address while router interfaces are assigned IPv6 link-local addresses. As such, ICMP error messages are generated with a source IP derived from the loopback / VRF interface, making it impossible to trace the actual packet path when parallel links exist between routers. The problem can be solved by implementing the solution proposed by RFC 4884 [5] and RFC 5837. The former defines an ICMP extension structure that can be appended to selected ICMP messages and carry extension objects. The latter defines an extension object called the "Interface Information Object" (IIO) that can carry interface information (e.g., name, index, MTU) about interfaces with certain roles such as the interface that received the datagram which elicited the ICMP error. The payload of the datagram that elicited the error (potentially padded / trimmed) along with the ICMP extension structure will be queued to the error queue of the originating socket, thereby allowing traceroute applications to parse and display the information encoded in the ICMP extension structure. Example: # traceroute6 -e 2001:db8:1::3 traceroute to 2001:db8:1::3 (2001:db8:1::3), 30 hops max, 80 byte packets 1 2001:db8:1::2 (2001:db8:1::2) <INC:11,"eth1",mtu=1500> 0.214 ms 0.171 ms 0.162 ms 2 2001:db8:1::3 (2001:db8:1::3) <INC:12,"eth2",mtu=1500> 0.154 ms 0.135 ms 0.127 ms # traceroute -e 192.0.2.3 traceroute to 192.0.2.3 (192.0.2.3), 30 hops max, 60 byte packets 1 192.0.2.2 (192.0.2.2) <INC:11,"eth1",mtu=1500> 0.191 ms 0.148 ms 0.144 ms 2 192.0.2.3 (192.0.2.3) <INC:12,"eth2",mtu=1500> 0.137 ms 0.122 ms 0.114 ms Implementation ============== As previously stated, the feature is controlled via a per-{netns, address} sysctl. Specifically, a bit mask where each bit controls the addition of a different ICMP extension to ICMP error messages. Currently, only a single value is supported, to append the incoming interface information. Key points: 1. Global knob vs finer control. I am not aware of users who require finer control, but it is possible that some users will want to avoid appending ICMP extensions when the packet is sent out of a specific interface (e.g., the management interface) or to a specific subnet. This can be accomplished via a tc-bpf program that trims the ICMP extension structure. An example program can be found here [6]. 2. Split implementation between IPv4 / IPv6. While the implementation is currently similar, there are some differences between both address families. In addition, some extensions (e.g., RFC 8883 [7]) are IPv6-specific. Given the above and given that the implementation is not very complex, it makes sense to keep both implementations separate. 3. Compatibility with legacy applications. RFC 4884 from 2007 extended certain ICMP messages with a length field that encodes the length of the "original datagram" field, so that applications will be able to tell where the "original datagram" ends and where the ICMP extension structure starts. Before the introduction of the IP{,6}_RECVERR_RFC4884 socket options [8][9] in 2020 it was impossible for applications to know where the ICMP extension structure starts and to this day some applications assume that it starts at offset 128, which is the minimum length of the "original datagram" field as specified by RFC 4884. Therefore, in order to be compatible with both legacy and modern applications, the datagram that elicited the ICMP error is trimmed / padded to 128 bytes before appending the ICMP extension structure. This behavior is specifically called out by RFC 4884: "Those wishing to be backward compatible with non-compliant TRACEROUTE implementations will include exactly 128 octets" [10]. Note that in 128 bytes we should be able to include enough headers for the originating node to match the ICMP error message with the relevant socket. For example, the following headers will be present in the "original datagram" field when a VXLAN encapsulated IPv6 packet elicits an ICMP error in an IPv6 underlay: IPv6 (40) | UDP (8) | VXLAN (8) | Eth (14) | IPv6 (40) | UDP (8). Overall, 118 bytes. If the 128 bytes limit proves to be insufficient for some use case, we can consider dedicating a new bit in the previously mentioned sysctl to allow for more bytes to be included in the "original datagram" field. 4. Extensibility. This patchset adds partial support for a single ICMP extension. However, the interface and the implementation should be able to support more extensions, if needed. Examples: * More interface information objects as part of RFC 5837. We should be able to derive the outgoing interface information and nexthop IP from the dst entry attached to the packet that elicited the error. * Node identification object (e.g., hostname / loopback IP) [11]. * Extended Information object which encodes aggregate header limits as part of RFC 8883. A previous proposal from Ishaan Gandhi and Ron Bonica is available here [12]. Testing ======= The existing traceroute selftest is extended to test that ICMP extensions are reported correctly when enabled. Both address families are tested and with different packet sizes in order to make sure that trimming / padding works correctly. Tested that packets are parsed correctly by the IP{,6}_RECVERR_RFC4884 socket options using Willem's selftest [13]. Changelog ========= Changes since v1 [14]: * Patches gregkh#1-gregkh#2: Added a comment about field ordering and review tags. * Patch gregkh#3: Converted "sysctl" to "echo" when testing the return value. Added a check to skip the test if traceroute version is older than 2.1.5. [1] https://datatracker.ietf.org/doc/html/rfc5837 [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1c2fb7f93cb20621772bf304f3dba0849942e5db [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fac6fce9bdb59837bb89930c3a92f5e0d1482f0b [4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4a8c416602d97a4e2073ed563d4d4c7627de19cf [5] https://datatracker.ietf.org/doc/html/rfc4884 [6] https://gist.github.com/idosch/5013448cdb5e9e060e6bfdc8b433577c [7] https://datatracker.ietf.org/doc/html/rfc8883 [8] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=eba75c587e811d3249c8bd50d22bb2266ccd3c0f [9] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=01370434df85eb76ecb1527a4466013c4aca2436 [10] https://datatracker.ietf.org/doc/html/rfc4884#section-5.3 [11] https://datatracker.ietf.org/doc/html/draft-ietf-intarea-extended-icmp-nodeid-04 [12] https://lore.kernel.org/netdev/[email protected]/ [13] https://lore.kernel.org/netdev/aPpMItF35gwpgzZx@shredder/ [14] https://lore.kernel.org/netdev/[email protected]/ ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

Marc Kleine-Budde <[email protected]> says: Similarly to how CAN FD reuses the bittiming logic of Classical CAN, CAN XL also reuses the entirety of CAN FD features, and, on top of that, adds new features which are specific to CAN XL. A so-called 'mixed-mode' is intended to have (XL-tolerant) CAN FD nodes and CAN XL nodes on one CAN segment, where the FD-controllers can talk CC/FD and the XL-controllers can talk CC/FD/XL. This mixed-mode utilizes the known error-signalling (ES) for sending CC/FD/XL frames. For CAN FD and CAN XL the tranceiver delay compensation (TDC) is supported to use common CAN and CAN-SIG transceivers. The CANXL-only mode disables the error-signalling in the CAN XL controller. This mode does not allow CC/FD frames to be sent but additionally offers a CAN XL transceiver mode switching (TMS) to send CAN XL frames with up to 20Mbit/s data rate. The TMS utilizes a PWM configuration which is added to the netlink interface. Configured with CAN_CTRLMODE_FD and CAN_CTRLMODE_XL this leads to: FD=0 XL=0 CC-only mode (ES=1) FD=1 XL=0 FD/CC mixed-mode (ES=1) FD=1 XL=1 XL/FD/CC mixed-mode (ES=1) FD=0 XL=1 XL-only mode (ES=0, TMS optional) Patch gregkh#1 print defined ctrlmode strings capitalized to increase the readability and to be in line with the 'ip' tool (iproute2). Patch gregkh#2 is a small clean-up which makes can_calc_bittiming() use NL_SET_ERR_MSG() instead of netdev_err(). Patch gregkh#3 adds a check in can_dev_dropped_skb() to drop CAN FD frames when CAN FD is turned off. Patch gregkh#4 adds CAN_CTRLMODE_RESTRICTED. Note that contrary to the other CAN_CTRL_MODE_XL_* that are introduced in the later patches, this control mode is not specific to CAN XL. The nuance is that because this restricted mode was only added in ISO 11898-1:2024, it is made mandatory for CAN XL devices but optional for other protocols. This is why this patch is added as a preparation before introducing the core CAN XL logic. Patch gregkh#5 adds all the CAN XL features which are inherited from CAN FD: the nominal bittiming, the data bittiming and the TDC. Patch gregkh#6 add a new CAN_CTRLMODE_XL_TMS control mode which is specific to CAN XL to enable the transceiver mode switching (TMS) in XL-only mode. Patch gregkh#7 adds a check in can_dev_dropped_skb() to drop CAN CC/FD frames when the CAN XL controller is in CAN XL-only mode. The introduced can_dev_in_xl_only_mode() function also determines the error-signalling configuration for the CAN XL controllers. Patch gregkh#8 to gregkh#11 add the PWM logic for the CAN XL TMS mode. Patch gregkh#12 to gregkh#14 add different default sample-points for standard CAN and CAN SIG transceivers (with TDC) and CAN XL transceivers using PWM in the CAN XL TMS mode. Patch gregkh#15 add a dummy_can driver for netlink testing and debugging. Patch gregkh#16 check CAN frame type (CC/FD/XL) when writing those frames to the CAN_RAW socket and reject them if it's not supported by the CAN interface. Patch gregkh#17 increase the resolution when printing the bitrate error and round-up the value to 0.01% in the case the resolution would still provide values which would lead to 0.00%. Link: https://patch.msgid.link/[email protected] Signed-off-by: Marc Kleine-Budde <[email protected]>

It's possible that the auxiliary proxy device we add when setting up the GPIO controller exposing shared pins, will get matched and probed immediately. The gpio-proxy-driver will then retrieve the shared descriptor structure. That will cause a recursive mutex locking and a deadlock because we're already holding the gpio_shared_lock in gpio_device_setup_shared() and try to take it again in devm_gpiod_shared_get() like this: [ 4.298346] gpiolib_shared: GPIO 130 owned by f100000.pinctrl is shared by multiple consumers [ 4.307157] gpiolib_shared: Setting up a shared GPIO entry for speaker@0,3 [ 4.314604] [ 4.316146] ============================================ [ 4.321600] WARNING: possible recursive locking detected [ 4.327054] 6.18.0-rc7-next-20251125-g3f300d0674f6-dirty #3887 Not tainted [ 4.334115] -------------------------------------------- [ 4.339566] kworker/u32:3/71 is trying to acquire lock: [ 4.344931] ffffda019ba71850 (gpio_shared_lock){+.+.}-{4:4}, at: devm_gpiod_shared_get+0x34/0x2e0 [ 4.354057] [ 4.354057] but task is already holding lock: [ 4.360041] ffffda019ba71850 (gpio_shared_lock){+.+.}-{4:4}, at: gpio_device_setup_shared+0x30/0x268 [ 4.369421] [ 4.369421] other info that might help us debug this: [ 4.376126] Possible unsafe locking scenario: [ 4.376126] [ 4.382198] CPU0 [ 4.384711] ---- [ 4.387223] lock(gpio_shared_lock); [ 4.390992] lock(gpio_shared_lock); [ 4.394761] [ 4.394761] *** DEADLOCK *** [ 4.394761] [ 4.400832] May be due to missing lock nesting notation [ 4.400832] [ 4.407802] 5 locks held by kworker/u32:3/71: [ 4.412279] #0: ffff000080020948 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x194/0x64c [ 4.422650] gregkh#1: ffff800080963d60 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work+0x1bc/0x64c [ 4.432117] gregkh#2: ffff00008165c8f8 (&dev->mutex){....}-{4:4}, at: __device_attach+0x3c/0x198 [ 4.440700] gregkh#3: ffffda019ba71850 (gpio_shared_lock){+.+.}-{4:4}, at: gpio_device_setup_shared+0x30/0x268 [ 4.450523] gregkh#4: ffff0000810fe918 (&dev->mutex){....}-{4:4}, at: __device_attach+0x3c/0x198 [ 4.459103] [ 4.459103] stack backtrace: [ 4.463581] CPU: 6 UID: 0 PID: 71 Comm: kworker/u32:3 Not tainted 6.18.0-rc7-next-20251125-g3f300d0674f6-dirty #3887 PREEMPT [ 4.463589] Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT) [ 4.463593] Workqueue: events_unbound deferred_probe_work_func [ 4.463602] Call trace: [ 4.463604] show_stack+0x18/0x24 (C) [ 4.463617] dump_stack_lvl+0x70/0x98 [ 4.463627] dump_stack+0x18/0x24 [ 4.463636] print_deadlock_bug+0x224/0x238 [ 4.463643] __lock_acquire+0xe4c/0x15f0 [ 4.463648] lock_acquire+0x1cc/0x344 [ 4.463653] __mutex_lock+0xb8/0x840 [ 4.463661] mutex_lock_nested+0x24/0x30 [ 4.463667] devm_gpiod_shared_get+0x34/0x2e0 [ 4.463674] gpio_shared_proxy_probe+0x18/0x138 [ 4.463682] auxiliary_bus_probe+0x40/0x78 [ 4.463688] really_probe+0xbc/0x2c0 [ 4.463694] __driver_probe_device+0x78/0x120 [ 4.463701] driver_probe_device+0x3c/0x160 [ 4.463708] __device_attach_driver+0xb8/0x140 [ 4.463716] bus_for_each_drv+0x88/0xe8 [ 4.463723] __device_attach+0xa0/0x198 [ 4.463729] device_initial_probe+0x14/0x20 [ 4.463737] bus_probe_device+0xb4/0xc0 [ 4.463743] device_add+0x578/0x76c [ 4.463747] __auxiliary_device_add+0x40/0xac [ 4.463752] gpio_device_setup_shared+0x1f8/0x268 [ 4.463758] gpiochip_add_data_with_key+0xdac/0x10ac [ 4.463763] devm_gpiochip_add_data_with_key+0x30/0x80 [ 4.463768] msm_pinctrl_probe+0x4b0/0x5e0 [ 4.463779] sm8250_pinctrl_probe+0x18/0x40 [ 4.463784] platform_probe+0x5c/0xa4 [ 4.463793] really_probe+0xbc/0x2c0 [ 4.463800] __driver_probe_device+0x78/0x120 [ 4.463807] driver_probe_device+0x3c/0x160 [ 4.463814] __device_attach_driver+0xb8/0x140 [ 4.463821] bus_for_each_drv+0x88/0xe8 [ 4.463827] __device_attach+0xa0/0x198 [ 4.463834] device_initial_probe+0x14/0x20 [ 4.463841] bus_probe_device+0xb4/0xc0 [ 4.463847] deferred_probe_work_func+0x90/0xcc [ 4.463854] process_one_work+0x214/0x64c [ 4.463860] worker_thread+0x1bc/0x360 [ 4.463866] kthread+0x14c/0x220 [ 4.463871] ret_from_fork+0x10/0x20 [ 77.265041] random: crng init done Fortunately, at the time of creating of the auxiliary device, we already know the correct entry so let's store it as the device's platform data. We don't need to hold gpio_shared_lock in devm_gpiod_shared_get() as we're not removing the entry or traversing the list anymore but we still need to protect it from concurrent modification of its fields so add a more fine-grained mutex. Fixes: a060b8c ("gpiolib: implement low-level, shared GPIO support") Reported-by: Dmitry Baryshkov <[email protected]> Closes: https://lore.kernel.org/all/fimuvblfy2cmn7o4wzcxjzrux5mwhvlvyxfsgeqs6ore2xg75i@ax46d3sfmdux/ Reviewed-by: Dmitry Baryshkov <[email protected]> Tested-by: Dmitry Baryshkov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Bartosz Golaszewski <[email protected]>

…ockdep While developing IPPROTO_SMBDIRECT support for the code under fs/smb/common/smbdirect [1], I noticed false positives like this: [T79] ====================================================== [T79] WARNING: possible circular locking dependency detected [T79] 6.18.0-rc4-metze-kasan-lockdep.01+ gregkh#1 Tainted: G OE [T79] ------------------------------------------------------ [T79] kworker/2:0/79 is trying to acquire lock: [T79] ffff88801f968278 (sk_lock-AF_INET){+.+.}-{0:0}, at: sock_set_reuseaddr+0x14/0x70 [T79] but task is already holding lock: [T79] ffffffffc10f7230 (lock#9){+.+.}-{4:4}, at: rdma_listen+0x3d2/0x740 [rdma_cm] [T79] which lock already depends on the new lock. [T79] the existing dependency chain (in reverse order) is: [T79] -> gregkh#1 (lock#9){+.+.}-{4:4}: [T79] __lock_acquire+0x535/0xc30 [T79] lock_acquire.part.0+0xb3/0x240 [T79] lock_acquire+0x60/0x140 [T79] __mutex_lock+0x1af/0x1c10 [T79] mutex_lock_nested+0x1b/0x30 [T79] cma_get_port+0xba/0x7d0 [rdma_cm] [T79] rdma_bind_addr_dst+0x598/0x9a0 [rdma_cm] [T79] cma_bind_addr+0x107/0x320 [rdma_cm] [T79] rdma_resolve_addr+0xa3/0x830 [rdma_cm] [T79] destroy_lease_table+0x12b/0x420 [ksmbd] [T79] ksmbd_NTtimeToUnix+0x3e/0x80 [ksmbd] [T79] ndr_encode_posix_acl+0x6e9/0xab0 [ksmbd] [T79] ndr_encode_v4_ntacl+0x53/0x870 [ksmbd] [T79] __sys_connect_file+0x131/0x1c0 [T79] __sys_connect+0x111/0x140 [T79] __x64_sys_connect+0x72/0xc0 [T79] x64_sys_call+0xe7d/0x26a0 [T79] do_syscall_64+0x93/0xff0 [T79] entry_SYSCALL_64_after_hwframe+0x76/0x7e [T79] -> #0 (sk_lock-AF_INET){+.+.}-{0:0}: [T79] check_prev_add+0xf3/0xcd0 [T79] validate_chain+0x466/0x590 [T79] __lock_acquire+0x535/0xc30 [T79] lock_acquire.part.0+0xb3/0x240 [T79] lock_acquire+0x60/0x140 [T79] lock_sock_nested+0x3b/0xf0 [T79] sock_set_reuseaddr+0x14/0x70 [T79] siw_create_listen+0x145/0x1540 [siw] [T79] iw_cm_listen+0x313/0x5b0 [iw_cm] [T79] cma_iw_listen+0x271/0x3c0 [rdma_cm] [T79] rdma_listen+0x3b1/0x740 [rdma_cm] [T79] cma_listen_on_dev+0x46a/0x750 [rdma_cm] [T79] rdma_listen+0x4b0/0x740 [rdma_cm] [T79] ksmbd_rdma_init+0x12b/0x270 [ksmbd] [T79] ksmbd_conn_transport_init+0x26/0x70 [ksmbd] [T79] server_ctrl_handle_work+0x1e5/0x280 [ksmbd] [T79] process_one_work+0x86c/0x1930 [T79] worker_thread+0x6f0/0x11f0 [T79] kthread+0x3ec/0x8b0 [T79] ret_from_fork+0x314/0x400 [T79] ret_from_fork_asm+0x1a/0x30 [T79] other info that might help us debug this: [T79] Possible unsafe locking scenario: [T79] CPU0 CPU1 [T79] ---- ---- [T79] lock(lock#9); [T79] lock(sk_lock-AF_INET); [T79] lock(lock#9); [T79] lock(sk_lock-AF_INET); [T79] *** DEADLOCK *** [T79] 5 locks held by kworker/2:0/79: [T79] #0: ffff88800120b158 ((wq_completion)events_long){+.+.}-{0:0}, at: process_one_work+0xfca/0x1930 [T79] gregkh#1: ffffc9000474fd00 ((work_completion)(&ctrl->ctrl_work)) {+.+.}-{0:0}, at: process_one_work+0x804/0x1930 [T79] gregkh#2: ffffffffc11307d0 (ctrl_lock){+.+.}-{4:4}, at: server_ctrl_handle_work+0x21/0x280 [ksmbd] [T79] gregkh#3: ffffffffc11347b0 (init_lock){+.+.}-{4:4}, at: ksmbd_conn_transport_init+0x18/0x70 [ksmbd] [T79] gregkh#4: ffffffffc10f7230 (lock#9){+.+.}-{4:4}, at: rdma_listen+0x3d2/0x740 [rdma_cm] [T79] stack backtrace: [T79] CPU: 2 UID: 0 PID: 79 Comm: kworker/2:0 Kdump: loaded Tainted: G OE 6.18.0-rc4-metze-kasan-lockdep.01+ gregkh#1 PREEMPT(voluntary) [T79] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE [T79] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [T79] Workqueue: events_long server_ctrl_handle_work [ksmbd] ... [T79] print_circular_bug+0xfd/0x130 [T79] check_noncircular+0x150/0x170 [T79] check_prev_add+0xf3/0xcd0 [T79] validate_chain+0x466/0x590 [T79] __lock_acquire+0x535/0xc30 [T79] ? srso_alias_return_thunk+0x5/0xfbef5 [T79] lock_acquire.part.0+0xb3/0x240 [T79] ? sock_set_reuseaddr+0x14/0x70 [T79] ? srso_alias_return_thunk+0x5/0xfbef5 [T79] ? __kasan_check_write+0x14/0x30 [T79] ? srso_alias_return_thunk+0x5/0xfbef5 [T79] ? apparmor_socket_post_create+0x180/0x700 [T79] lock_acquire+0x60/0x140 [T79] ? sock_set_reuseaddr+0x14/0x70 [T79] lock_sock_nested+0x3b/0xf0 [T79] ? sock_set_reuseaddr+0x14/0x70 [T79] sock_set_reuseaddr+0x14/0x70 [T79] siw_create_listen+0x145/0x1540 [siw] [T79] ? srso_alias_return_thunk+0x5/0xfbef5 [T79] ? local_clock_noinstr+0xe/0xd0 [T79] ? __pfx_siw_create_listen+0x10/0x10 [siw] [T79] ? trace_preempt_on+0x4c/0x130 [T79] ? __raw_spin_unlock_irqrestore+0x4a/0x90 [T79] ? srso_alias_return_thunk+0x5/0xfbef5 [T79] ? preempt_count_sub+0x52/0x80 [T79] iw_cm_listen+0x313/0x5b0 [iw_cm] [T79] cma_iw_listen+0x271/0x3c0 [rdma_cm] [T79] ? srso_alias_return_thunk+0x5/0xfbef5 [T79] rdma_listen+0x3b1/0x740 [rdma_cm] [T79] ? _raw_spin_unlock+0x2c/0x60 [T79] ? __pfx_rdma_listen+0x10/0x10 [rdma_cm] [T79] ? rdma_restrack_add+0x12c/0x630 [ib_core] [T79] ? srso_alias_return_thunk+0x5/0xfbef5 [T79] cma_listen_on_dev+0x46a/0x750 [rdma_cm] [T79] rdma_listen+0x4b0/0x740 [rdma_cm] [T79] ? __pfx_rdma_listen+0x10/0x10 [rdma_cm] [T79] ? cma_get_port+0x30d/0x7d0 [rdma_cm] [T79] ? srso_alias_return_thunk+0x5/0xfbef5 [T79] ? rdma_bind_addr_dst+0x598/0x9a0 [rdma_cm] [T79] ksmbd_rdma_init+0x12b/0x270 [ksmbd] [T79] ? __pfx_ksmbd_rdma_init+0x10/0x10 [ksmbd] [T79] ? srso_alias_return_thunk+0x5/0xfbef5 [T79] ? srso_alias_return_thunk+0x5/0xfbef5 [T79] ? register_netdevice_notifier+0x1dc/0x240 [T79] ksmbd_conn_transport_init+0x26/0x70 [ksmbd] [T79] server_ctrl_handle_work+0x1e5/0x280 [ksmbd] [T79] process_one_work+0x86c/0x1930 [T79] ? __pfx_process_one_work+0x10/0x10 [T79] ? srso_alias_return_thunk+0x5/0xfbef5 [T79] ? assign_work+0x16f/0x280 [T79] worker_thread+0x6f0/0x11f0 I was not able to reproduce this as I was testing with various runs switching siw and rxe as well as IPPROTO_SMBDIRECT sockets, while the above stack used siw with the non IPPROTO_SMBDIRECT patches [1]. Even if this patch doesn't solve the above I think it's a good idea to reclassify the sockets used by siw, I also send patches for rxe to reclassify, as well as my IPPROTO_SMBDIRECT socket patches [1] will do it, this should minimize potential false positives. [1] https://git.samba.org/?p=metze/linux/wip.git;a=shortlog;h=refs/heads/master-ipproto-smbdirect Cc: Bernard Metzler <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Leon Romanovsky <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Stefan Metzmacher <[email protected]> Link: https://patch.msgid.link/[email protected] Acked-by: Bernard Metzler <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>

The following warning appears when running syzkaller, and this issue also exists in the mainline code. ------------[ cut here ]------------ list_add double add: new=ffffffffa57eee28, prev=ffffffffa57eee28, next=ffffffffa5e63100. WARNING: CPU: 0 PID: 1491 at lib/list_debug.c:35 __list_add_valid_or_report+0xf7/0x130 Modules linked in: CPU: 0 PID: 1491 Comm: syz.1.28 Not tainted 6.6.0+ gregkh#3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:__list_add_valid_or_report+0xf7/0x130 RSP: 0018:ff1100010dfb7b78 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffffffffa57eee18 RCX: ffffffff97fc9817 RDX: 0000000000040000 RSI: ffa0000002383000 RDI: 0000000000000001 RBP: ffffffffa57eee28 R08: 0000000000000001 R09: ffe21c0021bf6f2c R10: 0000000000000001 R11: 6464615f7473696c R12: ffffffffa5e63100 R13: ffffffffa57eee28 R14: ffffffffa57eee28 R15: ff1100010dfb7d48 FS: 00007fb14398b640(0000) GS:ff11000119600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000010d096005 CR4: 0000000000773ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 80000000 Call Trace: <TASK> input_register_handler+0xb3/0x210 mac_hid_start_emulation+0x1c5/0x290 mac_hid_toggle_emumouse+0x20a/0x240 proc_sys_call_handler+0x4c2/0x6e0 new_sync_write+0x1b1/0x2d0 vfs_write+0x709/0x950 ksys_write+0x12a/0x250 do_syscall_64+0x5a/0x110 entry_SYSCALL_64_after_hwframe+0x78/0xe2 The WARNING occurs when two processes concurrently write to the mac-hid emulation sysctl, causing a race condition in mac_hid_toggle_emumouse(). Both processes read old_val=0, then both try to register the input handler, leading to a double list_add of the same handler. CPU0 CPU1 ------------------------- ------------------------- vfs_write() //write 1 vfs_write() //write 1 proc_sys_write() proc_sys_write() mac_hid_toggle_emumouse() mac_hid_toggle_emumouse() old_val = *valp // old_val=0 old_val = *valp // old_val=0 mutex_lock_killable() proc_dointvec() // *valp=1 mac_hid_start_emulation() input_register_handler() mutex_unlock() mutex_lock_killable() proc_dointvec() mac_hid_start_emulation() input_register_handler() //Trigger Warning mutex_unlock() Fix this by moving the old_val read inside the mutex lock region. Fixes: 99b089c ("Input: Mac button emulation - implement as an input filter") Signed-off-by: Long Li <[email protected]> Signed-off-by: Madhavan Srinivasan <[email protected]> Link: https://patch.msgid.link/[email protected]

When a large VM, specifically one that holds a significant number of PTEs, gets abruptly destroyed, the following warning is seen during the page-table walk: sched: CPU 0 need_resched set for > 100018840 ns (100 ticks) without schedule CPU: 0 UID: 0 PID: 9617 Comm: kvm_page_table_ Tainted: G O 6.16.0-smp-DEV gregkh#3 NONE Tainted: [O]=OOT_MODULE Call trace: show_stack+0x20/0x38 (C) dump_stack_lvl+0x3c/0xb8 dump_stack+0x18/0x30 resched_latency_warn+0x7c/0x88 sched_tick+0x1c4/0x268 update_process_times+0xa8/0xd8 tick_nohz_handler+0xc8/0x168 __hrtimer_run_queues+0x11c/0x338 hrtimer_interrupt+0x104/0x308 arch_timer_handler_phys+0x40/0x58 handle_percpu_devid_irq+0x8c/0x1b0 generic_handle_domain_irq+0x48/0x78 gic_handle_irq+0x1b8/0x408 call_on_irq_stack+0x24/0x30 do_interrupt_handler+0x54/0x78 el1_interrupt+0x44/0x88 el1h_64_irq_handler+0x18/0x28 el1h_64_irq+0x84/0x88 stage2_free_walker+0x30/0xa0 (P) __kvm_pgtable_walk+0x11c/0x258 __kvm_pgtable_walk+0x180/0x258 __kvm_pgtable_walk+0x180/0x258 __kvm_pgtable_walk+0x180/0x258 kvm_pgtable_walk+0xc4/0x140 kvm_pgtable_stage2_destroy+0x5c/0xf0 kvm_free_stage2_pgd+0x6c/0xe8 kvm_uninit_stage2_mmu+0x24/0x48 kvm_arch_flush_shadow_all+0x80/0xa0 kvm_mmu_notifier_release+0x38/0x78 __mmu_notifier_release+0x15c/0x250 exit_mmap+0x68/0x400 __mmput+0x38/0x1c8 mmput+0x30/0x68 exit_mm+0xd4/0x198 do_exit+0x1a4/0xb00 do_group_exit+0x8c/0x120 get_signal+0x6d4/0x778 do_signal+0x90/0x718 do_notify_resume+0x70/0x170 el0_svc+0x74/0xd8 el0t_64_sync_handler+0x60/0xc8 el0t_64_sync+0x1b0/0x1b8 The warning is seen majorly on the host kernels that are configured not to force-preempt, such as CONFIG_PREEMPT_NONE=y. To avoid this, instead of walking the entire page-table in one go, split it into smaller ranges, by checking for cond_resched() between each range. Since the path is executed during VM destruction, after the page-table structure is unlinked from the KVM MMU, relying on cond_resched_rwlock_write() isn't necessary. Signed-off-by: Raghavendra Rao Ananta <[email protected]> Link: https://msgid.link/[email protected] Signed-off-by: Oliver Upton <[email protected]>

[ Upstream commit a91c809 ] The original code causes a circular locking dependency found by lockdep. ====================================================== WARNING: possible circular locking dependency detected 6.16.0-rc6-lgci-xe-xe-pw-151626v3+ gregkh#1 Tainted: G S U ------------------------------------------------------ xe_fault_inject/5091 is trying to acquire lock: ffff888156815688 ((work_completion)(&(&devcd->del_wk)->work)){+.+.}-{0:0}, at: __flush_work+0x25d/0x660 but task is already holding lock: ffff888156815620 (&devcd->mutex){+.+.}-{3:3}, at: dev_coredump_put+0x3f/0xa0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> gregkh#2 (&devcd->mutex){+.+.}-{3:3}: mutex_lock_nested+0x4e/0xc0 devcd_data_write+0x27/0x90 sysfs_kf_bin_write+0x80/0xf0 kernfs_fop_write_iter+0x169/0x220 vfs_write+0x293/0x560 ksys_write+0x72/0xf0 __x64_sys_write+0x19/0x30 x64_sys_call+0x2bf/0x2660 do_syscall_64+0x93/0xb60 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> gregkh#1 (kn->active#236){++++}-{0:0}: kernfs_drain+0x1e2/0x200 __kernfs_remove+0xae/0x400 kernfs_remove_by_name_ns+0x5d/0xc0 remove_files+0x54/0x70 sysfs_remove_group+0x3d/0xa0 sysfs_remove_groups+0x2e/0x60 device_remove_attrs+0xc7/0x100 device_del+0x15d/0x3b0 devcd_del+0x19/0x30 process_one_work+0x22b/0x6f0 worker_thread+0x1e8/0x3d0 kthread+0x11c/0x250 ret_from_fork+0x26c/0x2e0 ret_from_fork_asm+0x1a/0x30 -> #0 ((work_completion)(&(&devcd->del_wk)->work)){+.+.}-{0:0}: __lock_acquire+0x1661/0x2860 lock_acquire+0xc4/0x2f0 __flush_work+0x27a/0x660 flush_delayed_work+0x5d/0xa0 dev_coredump_put+0x63/0xa0 xe_driver_devcoredump_fini+0x12/0x20 [xe] devm_action_release+0x12/0x30 release_nodes+0x3a/0x120 devres_release_all+0x8a/0xd0 device_unbind_cleanup+0x12/0x80 device_release_driver_internal+0x23a/0x280 device_driver_detach+0x14/0x20 unbind_store+0xaf/0xc0 drv_attr_store+0x21/0x50 sysfs_kf_write+0x4a/0x80 kernfs_fop_write_iter+0x169/0x220 vfs_write+0x293/0x560 ksys_write+0x72/0xf0 __x64_sys_write+0x19/0x30 x64_sys_call+0x2bf/0x2660 do_syscall_64+0x93/0xb60 entry_SYSCALL_64_after_hwframe+0x76/0x7e other info that might help us debug this: Chain exists of: (work_completion)(&(&devcd->del_wk)->work) --> kn->active#236 --> &devcd->mutex Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&devcd->mutex); lock(kn->active#236); lock(&devcd->mutex); lock((work_completion)(&(&devcd->del_wk)->work)); *** DEADLOCK *** 5 locks held by xe_fault_inject/5091: #0: ffff8881129f9488 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x72/0xf0 gregkh#1: ffff88810c755078 (&of->mutex#2){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x123/0x220 gregkh#2: ffff8881054811a0 (&dev->mutex){....}-{3:3}, at: device_release_driver_internal+0x55/0x280 gregkh#3: ffff888156815620 (&devcd->mutex){+.+.}-{3:3}, at: dev_coredump_put+0x3f/0xa0 gregkh#4: ffffffff8359e020 (rcu_read_lock){....}-{1:2}, at: __flush_work+0x72/0x660 stack backtrace: CPU: 14 UID: 0 PID: 5091 Comm: xe_fault_inject Tainted: G S U 6.16.0-rc6-lgci-xe-xe-pw-151626v3+ gregkh#1 PREEMPT_{RT,(lazy)} Tainted: [S]=CPU_OUT_OF_SPEC, [U]=USER Hardware name: Micro-Star International Co., Ltd. MS-7D25/PRO Z690-A DDR4(MS-7D25), BIOS 1.10 12/13/2021 Call Trace: <TASK> dump_stack_lvl+0x91/0xf0 dump_stack+0x10/0x20 print_circular_bug+0x285/0x360 check_noncircular+0x135/0x150 ? register_lock_class+0x48/0x4a0 __lock_acquire+0x1661/0x2860 lock_acquire+0xc4/0x2f0 ? __flush_work+0x25d/0x660 ? mark_held_locks+0x46/0x90 ? __flush_work+0x25d/0x660 __flush_work+0x27a/0x660 ? __flush_work+0x25d/0x660 ? trace_hardirqs_on+0x1e/0xd0 ? __pfx_wq_barrier_func+0x10/0x10 flush_delayed_work+0x5d/0xa0 dev_coredump_put+0x63/0xa0 xe_driver_devcoredump_fini+0x12/0x20 [xe] devm_action_release+0x12/0x30 release_nodes+0x3a/0x120 devres_release_all+0x8a/0xd0 device_unbind_cleanup+0x12/0x80 device_release_driver_internal+0x23a/0x280 ? bus_find_device+0xa8/0xe0 device_driver_detach+0x14/0x20 unbind_store+0xaf/0xc0 drv_attr_store+0x21/0x50 sysfs_kf_write+0x4a/0x80 kernfs_fop_write_iter+0x169/0x220 vfs_write+0x293/0x560 ksys_write+0x72/0xf0 __x64_sys_write+0x19/0x30 x64_sys_call+0x2bf/0x2660 do_syscall_64+0x93/0xb60 ? __f_unlock_pos+0x15/0x20 ? __x64_sys_getdents64+0x9b/0x130 ? __pfx_filldir64+0x10/0x10 ? do_syscall_64+0x1a2/0xb60 ? clear_bhb_loop+0x30/0x80 ? clear_bhb_loop+0x30/0x80 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x76e292edd574 Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d d5 ea 0e 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89 RSP: 002b:00007fffe247a828 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000076e292edd574 RDX: 000000000000000c RSI: 00006267f6306063 RDI: 000000000000000b RBP: 000000000000000c R08: 000076e292fc4b20 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000202 R12: 00006267f6306063 R13: 000000000000000b R14: 00006267e6859c00 R15: 000076e29322a000 </TASK> xe 0000:03:00.0: [drm] Xe device coredump has been deleted. Fixes: 01daccf ("devcoredump : Serialize devcd_del work") Cc: Mukesh Ojha <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Johannes Berg <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Cc: Danilo Krummrich <[email protected]> Cc: [email protected] Cc: [email protected] # v6.1+ Signed-off-by: Maarten Lankhorst <[email protected]> Cc: Matthew Brost <[email protected]> Acked-by: Mukesh Ojha <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> [ replaced disable_delayed_work_sync() with cancel_delayed_work_sync() ] Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 2d8ab77 upstream. The U-Blox EVK-M101 enumerates as 1546:0506 [1] with four FTDI interfaces: - EVK-M101 current sensors - EVK-M101 I2C - EVK-M101 UART - EVK-M101 port D Only the third USB interface is a UART. This change lets ftdi_sio probe the VID/PID and registers only interface gregkh#3 as a TTY, leaving the rest available for other drivers. [1] usb 5-1.3: new high-speed USB device number 11 using xhci_hcd usb 5-1.3: New USB device found, idVendor=1546, idProduct=0506, bcdDevice= 8.00 usb 5-1.3: New USB device strings: Mfr=1, Product=2, SerialNumber=0 usb 5-1.3: Product: EVK-M101 usb 5-1.3: Manufacturer: u-blox AG Datasheet: https://content.u-blox.com/sites/default/files/documents/EVK-M10_UserGuide_UBX-21003949.pdf Signed-off-by: Oleksandr Suvorov <[email protected]> Link: https://lore.kernel.org/[email protected]/ Cc: [email protected] Signed-off-by: Johan Hovold <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

[ Upstream commit 84bbe32 ] On completion of i915_vma_pin_ww(), a synchronous variant of dma_fence_work_commit() is called. When pinning a VMA to GGTT address space on a Cherry View family processor, or on a Broxton generation SoC with VTD enabled, i.e., when stop_machine() is then called from intel_ggtt_bind_vma(), that can potentially lead to lock inversion among reservation_ww and cpu_hotplug locks. [86.861179] ====================================================== [86.861193] WARNING: possible circular locking dependency detected [86.861209] 6.15.0-rc5-CI_DRM_16515-gca0305cadc2d+ gregkh#1 Tainted: G U [86.861226] ------------------------------------------------------ [86.861238] i915_module_loa/1432 is trying to acquire lock: [86.861252] ffffffff83489090 (cpu_hotplug_lock){++++}-{0:0}, at: stop_machine+0x1c/0x50 [86.861290] but task is already holding lock: [86.861303] ffffc90002e0b4c8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: i915_vma_pin.constprop.0+0x39/0x1d0 [i915] [86.862233] which lock already depends on the new lock. [86.862251] the existing dependency chain (in reverse order) is: [86.862265] -> gregkh#5 (reservation_ww_class_mutex){+.+.}-{3:3}: [86.862292] dma_resv_lockdep+0x19a/0x390 [86.862315] do_one_initcall+0x60/0x3f0 [86.862334] kernel_init_freeable+0x3cd/0x680 [86.862353] kernel_init+0x1b/0x200 [86.862369] ret_from_fork+0x47/0x70 [86.862383] ret_from_fork_asm+0x1a/0x30 [86.862399] -> gregkh#4 (reservation_ww_class_acquire){+.+.}-{0:0}: [86.862425] dma_resv_lockdep+0x178/0x390 [86.862440] do_one_initcall+0x60/0x3f0 [86.862454] kernel_init_freeable+0x3cd/0x680 [86.862470] kernel_init+0x1b/0x200 [86.862482] ret_from_fork+0x47/0x70 [86.862495] ret_from_fork_asm+0x1a/0x30 [86.862509] -> gregkh#3 (&mm->mmap_lock){++++}-{3:3}: [86.862531] down_read_killable+0x46/0x1e0 [86.862546] lock_mm_and_find_vma+0xa2/0x280 [86.862561] do_user_addr_fault+0x266/0x8e0 [86.862578] exc_page_fault+0x8a/0x2f0 [86.862593] asm_exc_page_fault+0x27/0x30 [86.862607] filldir64+0xeb/0x180 [86.862620] kernfs_fop_readdir+0x118/0x480 [86.862635] iterate_dir+0xcf/0x2b0 [86.862648] __x64_sys_getdents64+0x84/0x140 [86.862661] x64_sys_call+0x1058/0x2660 [86.862675] do_syscall_64+0x91/0xe90 [86.862689] entry_SYSCALL_64_after_hwframe+0x76/0x7e [86.862703] -> gregkh#2 (&root->kernfs_rwsem){++++}-{3:3}: [86.862725] down_write+0x3e/0xf0 [86.862738] kernfs_add_one+0x30/0x3c0 [86.862751] kernfs_create_dir_ns+0x53/0xb0 [86.862765] internal_create_group+0x134/0x4c0 [86.862779] sysfs_create_group+0x13/0x20 [86.862792] topology_add_dev+0x1d/0x30 [86.862806] cpuhp_invoke_callback+0x4b5/0x850 [86.862822] cpuhp_issue_call+0xbf/0x1f0 [86.862836] __cpuhp_setup_state_cpuslocked+0x111/0x320 [86.862852] __cpuhp_setup_state+0xb0/0x220 [86.862866] topology_sysfs_init+0x30/0x50 [86.862879] do_one_initcall+0x60/0x3f0 [86.862893] kernel_init_freeable+0x3cd/0x680 [86.862908] kernel_init+0x1b/0x200 [86.862921] ret_from_fork+0x47/0x70 [86.862934] ret_from_fork_asm+0x1a/0x30 [86.862947] -> gregkh#1 (cpuhp_state_mutex){+.+.}-{3:3}: [86.862969] __mutex_lock+0xaa/0xed0 [86.862982] mutex_lock_nested+0x1b/0x30 [86.862995] __cpuhp_setup_state_cpuslocked+0x67/0x320 [86.863012] __cpuhp_setup_state+0xb0/0x220 [86.863026] page_alloc_init_cpuhp+0x2d/0x60 [86.863041] mm_core_init+0x22/0x2d0 [86.863054] start_kernel+0x576/0xbd0 [86.863068] x86_64_start_reservations+0x18/0x30 [86.863084] x86_64_start_kernel+0xbf/0x110 [86.863098] common_startup_64+0x13e/0x141 [86.863114] -> #0 (cpu_hotplug_lock){++++}-{0:0}: [86.863135] __lock_acquire+0x1635/0x2810 [86.863152] lock_acquire+0xc4/0x2f0 [86.863166] cpus_read_lock+0x41/0x100 [86.863180] stop_machine+0x1c/0x50 [86.863194] bxt_vtd_ggtt_insert_entries__BKL+0x3b/0x60 [i915] [86.863987] intel_ggtt_bind_vma+0x43/0x70 [i915] [86.864735] __vma_bind+0x55/0x70 [i915] [86.865510] fence_work+0x26/0xa0 [i915] [86.866248] fence_notify+0xa1/0x140 [i915] [86.866983] __i915_sw_fence_complete+0x8f/0x270 [i915] [86.867719] i915_sw_fence_commit+0x39/0x60 [i915] [86.868453] i915_vma_pin_ww+0x462/0x1360 [i915] [86.869228] i915_vma_pin.constprop.0+0x133/0x1d0 [i915] [86.870001] initial_plane_vma+0x307/0x840 [i915] [86.870774] intel_initial_plane_config+0x33f/0x670 [i915] [86.871546] intel_display_driver_probe_nogem+0x1c6/0x260 [i915] [86.872330] i915_driver_probe+0x7fa/0xe80 [i915] [86.873057] i915_pci_probe+0xe6/0x220 [i915] [86.873782] local_pci_probe+0x47/0xb0 [86.873802] pci_device_probe+0xf3/0x260 [86.873817] really_probe+0xf1/0x3c0 [86.873833] __driver_probe_device+0x8c/0x180 [86.873848] driver_probe_device+0x24/0xd0 [86.873862] __driver_attach+0x10f/0x220 [86.873876] bus_for_each_dev+0x7f/0xe0 [86.873892] driver_attach+0x1e/0x30 [86.873904] bus_add_driver+0x151/0x290 [86.873917] driver_register+0x5e/0x130 [86.873931] __pci_register_driver+0x7d/0x90 [86.873945] i915_pci_register_driver+0x23/0x30 [i915] [86.874678] i915_init+0x37/0x120 [i915] [86.875347] do_one_initcall+0x60/0x3f0 [86.875369] do_init_module+0x97/0x2a0 [86.875385] load_module+0x2c54/0x2d80 [86.875398] init_module_from_file+0x96/0xe0 [86.875413] idempotent_init_module+0x117/0x330 [86.875426] __x64_sys_finit_module+0x77/0x100 [86.875440] x64_sys_call+0x24de/0x2660 [86.875454] do_syscall_64+0x91/0xe90 [86.875470] entry_SYSCALL_64_after_hwframe+0x76/0x7e [86.875486] other info that might help us debug this: [86.875502] Chain exists of: cpu_hotplug_lock --> reservation_ww_class_acquire --> reservation_ww_class_mutex [86.875539] Possible unsafe locking scenario: [86.875552] CPU0 CPU1 [86.875563] ---- ---- [86.875573] lock(reservation_ww_class_mutex); [86.875588] lock(reservation_ww_class_acquire); [86.875606] lock(reservation_ww_class_mutex); [86.875624] rlock(cpu_hotplug_lock); [86.875637] *** DEADLOCK *** [86.875650] 3 locks held by i915_module_loa/1432: [86.875663] #0: ffff888101f5c1b0 (&dev->mutex){....}-{3:3}, at: __driver_attach+0x104/0x220 [86.875699] gregkh#1: ffffc90002e0b4a0 (reservation_ww_class_acquire){+.+.}-{0:0}, at: i915_vma_pin.constprop.0+0x39/0x1d0 [i915] [86.876512] gregkh#2: ffffc90002e0b4c8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: i915_vma_pin.constprop.0+0x39/0x1d0 [i915] [86.877305] stack backtrace: [86.877326] CPU: 0 UID: 0 PID: 1432 Comm: i915_module_loa Tainted: G U 6.15.0-rc5-CI_DRM_16515-gca0305cadc2d+ gregkh#1 PREEMPT(voluntary) [86.877334] Tainted: [U]=USER [86.877336] Hardware name: /NUC5CPYB, BIOS PYBSWCEL.86A.0079.2020.0420.1316 04/20/2020 [86.877339] Call Trace: [86.877344] <TASK> [86.877353] dump_stack_lvl+0x91/0xf0 [86.877364] dump_stack+0x10/0x20 [86.877369] print_circular_bug+0x285/0x360 [86.877379] check_noncircular+0x135/0x150 [86.877390] __lock_acquire+0x1635/0x2810 [86.877403] lock_acquire+0xc4/0x2f0 [86.877408] ? stop_machine+0x1c/0x50 [86.877422] ? __pfx_bxt_vtd_ggtt_insert_entries__cb+0x10/0x10 [i915] [86.878173] cpus_read_lock+0x41/0x100 [86.878182] ? stop_machine+0x1c/0x50 [86.878191] ? __pfx_bxt_vtd_ggtt_insert_entries__cb+0x10/0x10 [i915] [86.878916] stop_machine+0x1c/0x50 [86.878927] bxt_vtd_ggtt_insert_entries__BKL+0x3b/0x60 [i915] [86.879652] intel_ggtt_bind_vma+0x43/0x70 [i915] [86.880375] __vma_bind+0x55/0x70 [i915] [86.881133] fence_work+0x26/0xa0 [i915] [86.881851] fence_notify+0xa1/0x140 [i915] [86.882566] __i915_sw_fence_complete+0x8f/0x270 [i915] [86.883286] i915_sw_fence_commit+0x39/0x60 [i915] [86.884003] i915_vma_pin_ww+0x462/0x1360 [i915] [86.884756] ? i915_vma_pin.constprop.0+0x6c/0x1d0 [i915] [86.885513] i915_vma_pin.constprop.0+0x133/0x1d0 [i915] [86.886281] initial_plane_vma+0x307/0x840 [i915] [86.887049] intel_initial_plane_config+0x33f/0x670 [i915] [86.887819] intel_display_driver_probe_nogem+0x1c6/0x260 [i915] [86.888587] i915_driver_probe+0x7fa/0xe80 [i915] [86.889293] ? mutex_unlock+0x12/0x20 [86.889301] ? drm_privacy_screen_get+0x171/0x190 [86.889308] ? acpi_dev_found+0x66/0x80 [86.889321] i915_pci_probe+0xe6/0x220 [i915] [86.890038] local_pci_probe+0x47/0xb0 [86.890049] pci_device_probe+0xf3/0x260 [86.890058] really_probe+0xf1/0x3c0 [86.890067] __driver_probe_device+0x8c/0x180 [86.890072] driver_probe_device+0x24/0xd0 [86.890078] __driver_attach+0x10f/0x220 [86.890083] ? __pfx___driver_attach+0x10/0x10 [86.890088] bus_for_each_dev+0x7f/0xe0 [86.890097] driver_attach+0x1e/0x30 [86.890101] bus_add_driver+0x151/0x290 [86.890107] driver_register+0x5e/0x130 [86.890113] __pci_register_driver+0x7d/0x90 [86.890119] i915_pci_register_driver+0x23/0x30 [i915] [86.890833] i915_init+0x37/0x120 [i915] [86.891482] ? __pfx_i915_init+0x10/0x10 [i915] [86.892135] do_one_initcall+0x60/0x3f0 [86.892145] ? __kmalloc_cache_noprof+0x33f/0x470 [86.892157] do_init_module+0x97/0x2a0 [86.892164] load_module+0x2c54/0x2d80 [86.892168] ? __kernel_read+0x15c/0x300 [86.892185] ? kernel_read_file+0x2b1/0x320 [86.892195] init_module_from_file+0x96/0xe0 [86.892199] ? init_module_from_file+0x96/0xe0 [86.892211] idempotent_init_module+0x117/0x330 [86.892224] __x64_sys_finit_module+0x77/0x100 [86.892230] x64_sys_call+0x24de/0x2660 [86.892236] do_syscall_64+0x91/0xe90 [86.892243] ? irqentry_exit+0x77/0xb0 [86.892249] ? sysvec_apic_timer_interrupt+0x57/0xc0 [86.892256] entry_SYSCALL_64_after_hwframe+0x76/0x7e [86.892261] RIP: 0033:0x7303e1b2725d [86.892271] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8b bb 0d 00 f7 d8 64 89 01 48 [86.892276] RSP: 002b:00007ffddd1fdb38 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [86.892281] RAX: ffffffffffffffda RBX: 00005d771d88fd90 RCX: 00007303e1b2725d [86.892285] RDX: 0000000000000000 RSI: 00005d771d893aa0 RDI: 000000000000000c [86.892287] RBP: 00007ffddd1fdbf0 R08: 0000000000000040 R09: 00007ffddd1fdb80 [86.892289] R10: 00007303e1c03b20 R11: 0000000000000246 R12: 00005d771d893aa0 [86.892292] R13: 0000000000000000 R14: 00005d771d88f0d0 R15: 00005d771d895710 [86.892304] </TASK> Call asynchronous variant of dma_fence_work_commit() in that case. v3: Provide more verbose in-line comment (Andi), - mention target environments in commit message. Fixes: 7d1c261 ("drm/i915: Take reservation lock around i915_vma_pin.") Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/14985 Cc: Andi Shyti <[email protected]> Signed-off-by: Janusz Krzysztofik <[email protected]> Reviewed-by: Sebastian Brzezinka <[email protected]> Reviewed-by: Krzysztof Karas <[email protected]> Acked-by: Andi Shyti <[email protected]> Signed-off-by: Andi Shyti <[email protected]> Link: https://lore.kernel.org/r/[email protected] (cherry picked from commit 648ef13) Signed-off-by: Rodrigo Vivi <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

commit 9d274c1 upstream. We have been seeing crashes on duplicate keys in btrfs_set_item_key_safe(): BTRFS critical (device vdb): slot 4 key (450 108 8192) new key (450 108 8192) ------------[ cut here ]------------ kernel BUG at fs/btrfs/ctree.c:2620! invalid opcode: 0000 [gregkh#1] PREEMPT SMP PTI CPU: 0 PID: 3139 Comm: xfs_io Kdump: loaded Not tainted 6.9.0 gregkh#6 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014 RIP: 0010:btrfs_set_item_key_safe+0x11f/0x290 [btrfs] With the following stack trace: #0 btrfs_set_item_key_safe (fs/btrfs/ctree.c:2620:4) gregkh#1 btrfs_drop_extents (fs/btrfs/file.c:411:4) gregkh#2 log_one_extent (fs/btrfs/tree-log.c:4732:9) gregkh#3 btrfs_log_changed_extents (fs/btrfs/tree-log.c:4955:9) gregkh#4 btrfs_log_inode (fs/btrfs/tree-log.c:6626:9) gregkh#5 btrfs_log_inode_parent (fs/btrfs/tree-log.c:7070:8) gregkh#6 btrfs_log_dentry_safe (fs/btrfs/tree-log.c:7171:8) gregkh#7 btrfs_sync_file (fs/btrfs/file.c:1933:8) gregkh#8 vfs_fsync_range (fs/sync.c:188:9) gregkh#9 vfs_fsync (fs/sync.c:202:9) gregkh#10 do_fsync (fs/sync.c:212:9) gregkh#11 __do_sys_fdatasync (fs/sync.c:225:9) gregkh#12 __se_sys_fdatasync (fs/sync.c:223:1) gregkh#13 __x64_sys_fdatasync (fs/sync.c:223:1) gregkh#14 do_syscall_x64 (arch/x86/entry/common.c:52:14) gregkh#15 do_syscall_64 (arch/x86/entry/common.c:83:7) gregkh#16 entry_SYSCALL_64+0xaf/0x14c (arch/x86/entry/entry_64.S:121) So we're logging a changed extent from fsync, which is splitting an extent in the log tree. But this split part already exists in the tree, triggering the BUG(). This is the state of the log tree at the time of the crash, dumped with drgn (https://github.com/osandov/drgn/blob/main/contrib/btrfs_tree.py) to get more details than btrfs_print_leaf() gives us: >>> print_extent_buffer(prog.crashed_thread().stack_trace()[0]["eb"]) leaf 33439744 level 0 items 72 generation 9 owner 18446744073709551610 leaf 33439744 flags 0x100000000000000 fs uuid e5bd3946-400c-4223-8923-190ef1f18677 chunk uuid d58cb17e-6d02-494a-829a-18b7d8a399da item 0 key (450 INODE_ITEM 0) itemoff 16123 itemsize 160 generation 7 transid 9 size 8192 nbytes 8473563889606862198 block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0 sequence 204 flags 0x10(PREALLOC) atime 1716417703.220000000 (2024-05-22 15:41:43) ctime 1716417704.983333333 (2024-05-22 15:41:44) mtime 1716417704.983333333 (2024-05-22 15:41:44) otime 17592186044416.000000000 (559444-03-08 01:40:16) item 1 key (450 INODE_REF 256) itemoff 16110 itemsize 13 index 195 namelen 3 name: 193 item 2 key (450 XATTR_ITEM 1640047104) itemoff 16073 itemsize 37 location key (0 UNKNOWN.0 0) type XATTR transid 7 data_len 1 name_len 6 name: user.a data a item 3 key (450 EXTENT_DATA 0) itemoff 16020 itemsize 53 generation 9 type 1 (regular) extent data disk byte 303144960 nr 12288 extent data offset 0 nr 4096 ram 12288 extent compression 0 (none) item 4 key (450 EXTENT_DATA 4096) itemoff 15967 itemsize 53 generation 9 type 2 (prealloc) prealloc data disk byte 303144960 nr 12288 prealloc data offset 4096 nr 8192 item 5 key (450 EXTENT_DATA 8192) itemoff 15914 itemsize 53 generation 9 type 2 (prealloc) prealloc data disk byte 303144960 nr 12288 prealloc data offset 8192 nr 4096 ... So the real problem happened earlier: notice that items 4 (4k-12k) and 5 (8k-12k) overlap. Both are prealloc extents. Item 4 straddles i_size and item 5 starts at i_size. Here is the state of the filesystem tree at the time of the crash: >>> root = prog.crashed_thread().stack_trace()[2]["inode"].root >>> ret, nodes, slots = btrfs_search_slot(root, BtrfsKey(450, 0, 0)) >>> print_extent_buffer(nodes[0]) leaf 30425088 level 0 items 184 generation 9 owner 5 leaf 30425088 flags 0x100000000000000 fs uuid e5bd3946-400c-4223-8923-190ef1f18677 chunk uuid d58cb17e-6d02-494a-829a-18b7d8a399da ... item 179 key (450 INODE_ITEM 0) itemoff 4907 itemsize 160 generation 7 transid 7 size 4096 nbytes 12288 block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0 sequence 6 flags 0x10(PREALLOC) atime 1716417703.220000000 (2024-05-22 15:41:43) ctime 1716417703.220000000 (2024-05-22 15:41:43) mtime 1716417703.220000000 (2024-05-22 15:41:43) otime 1716417703.220000000 (2024-05-22 15:41:43) item 180 key (450 INODE_REF 256) itemoff 4894 itemsize 13 index 195 namelen 3 name: 193 item 181 key (450 XATTR_ITEM 1640047104) itemoff 4857 itemsize 37 location key (0 UNKNOWN.0 0) type XATTR transid 7 data_len 1 name_len 6 name: user.a data a item 182 key (450 EXTENT_DATA 0) itemoff 4804 itemsize 53 generation 9 type 1 (regular) extent data disk byte 303144960 nr 12288 extent data offset 0 nr 8192 ram 12288 extent compression 0 (none) item 183 key (450 EXTENT_DATA 8192) itemoff 4751 itemsize 53 generation 9 type 2 (prealloc) prealloc data disk byte 303144960 nr 12288 prealloc data offset 8192 nr 4096 Item 5 in the log tree corresponds to item 183 in the filesystem tree, but nothing matches item 4. Furthermore, item 183 is the last item in the leaf. btrfs_log_prealloc_extents() is responsible for logging prealloc extents beyond i_size. It first truncates any previously logged prealloc extents that start beyond i_size. Then, it walks the filesystem tree and copies the prealloc extent items to the log tree. If it hits the end of a leaf, then it calls btrfs_next_leaf(), which unlocks the tree and does another search. However, while the filesystem tree is unlocked, an ordered extent completion may modify the tree. In particular, it may insert an extent item that overlaps with an extent item that was already copied to the log tree. This may manifest in several ways depending on the exact scenario, including an EEXIST error that is silently translated to a full sync, overlapping items in the log tree, or this crash. This particular crash is triggered by the following sequence of events: - Initially, the file has i_size=4k, a regular extent from 0-4k, and a prealloc extent beyond i_size from 4k-12k. The prealloc extent item is the last item in its B-tree leaf. - The file is fsync'd, which copies its inode item and both extent items to the log tree. - An xattr is set on the file, which sets the BTRFS_INODE_COPY_EVERYTHING flag. - The range 4k-8k in the file is written using direct I/O. i_size is extended to 8k, but the ordered extent is still in flight. - The file is fsync'd. Since BTRFS_INODE_COPY_EVERYTHING is set, this calls copy_inode_items_to_log(), which calls btrfs_log_prealloc_extents(). - btrfs_log_prealloc_extents() finds the 4k-12k prealloc extent in the filesystem tree. Since it starts before i_size, it skips it. Since it is the last item in its B-tree leaf, it calls btrfs_next_leaf(). - btrfs_next_leaf() unlocks the path. - The ordered extent completion runs, which converts the 4k-8k part of the prealloc extent to written and inserts the remaining prealloc part from 8k-12k. - btrfs_next_leaf() does a search and finds the new prealloc extent 8k-12k. - btrfs_log_prealloc_extents() copies the 8k-12k prealloc extent into the log tree. Note that it overlaps with the 4k-12k prealloc extent that was copied to the log tree by the first fsync. - fsync calls btrfs_log_changed_extents(), which tries to log the 4k-8k extent that was written. - This tries to drop the range 4k-8k in the log tree, which requires adjusting the start of the 4k-12k prealloc extent in the log tree to 8k. - btrfs_set_item_key_safe() sees that there is already an extent starting at 8k in the log tree and calls BUG(). Fix this by detecting when we're about to insert an overlapping file extent item in the log tree and truncating the part that would overlap. CC: [email protected] # 6.1+ Reviewed-by: Filipe Manana <[email protected]> Signed-off-by: Omar Sandoval <[email protected]> Signed-off-by: David Sterba <[email protected]> Signed-off-by: Harshvardhan Jha <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 2d8ab77 upstream. The U-Blox EVK-M101 enumerates as 1546:0506 [1] with four FTDI interfaces: - EVK-M101 current sensors - EVK-M101 I2C - EVK-M101 UART - EVK-M101 port D Only the third USB interface is a UART. This change lets ftdi_sio probe the VID/PID and registers only interface gregkh#3 as a TTY, leaving the rest available for other drivers. [1] usb 5-1.3: new high-speed USB device number 11 using xhci_hcd usb 5-1.3: New USB device found, idVendor=1546, idProduct=0506, bcdDevice= 8.00 usb 5-1.3: New USB device strings: Mfr=1, Product=2, SerialNumber=0 usb 5-1.3: Product: EVK-M101 usb 5-1.3: Manufacturer: u-blox AG Datasheet: https://content.u-blox.com/sites/default/files/documents/EVK-M10_UserGuide_UBX-21003949.pdf Signed-off-by: Oleksandr Suvorov <[email protected]> Link: https://lore.kernel.org/[email protected]/ Cc: [email protected] Signed-off-by: Johan Hovold <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

[ Upstream commit 84bbe32 ] On completion of i915_vma_pin_ww(), a synchronous variant of dma_fence_work_commit() is called. When pinning a VMA to GGTT address space on a Cherry View family processor, or on a Broxton generation SoC with VTD enabled, i.e., when stop_machine() is then called from intel_ggtt_bind_vma(), that can potentially lead to lock inversion among reservation_ww and cpu_hotplug locks. [86.861179] ====================================================== [86.861193] WARNING: possible circular locking dependency detected [86.861209] 6.15.0-rc5-CI_DRM_16515-gca0305cadc2d+ gregkh#1 Tainted: G U [86.861226] ------------------------------------------------------ [86.861238] i915_module_loa/1432 is trying to acquire lock: [86.861252] ffffffff83489090 (cpu_hotplug_lock){++++}-{0:0}, at: stop_machine+0x1c/0x50 [86.861290] but task is already holding lock: [86.861303] ffffc90002e0b4c8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: i915_vma_pin.constprop.0+0x39/0x1d0 [i915] [86.862233] which lock already depends on the new lock. [86.862251] the existing dependency chain (in reverse order) is: [86.862265] -> gregkh#5 (reservation_ww_class_mutex){+.+.}-{3:3}: [86.862292] dma_resv_lockdep+0x19a/0x390 [86.862315] do_one_initcall+0x60/0x3f0 [86.862334] kernel_init_freeable+0x3cd/0x680 [86.862353] kernel_init+0x1b/0x200 [86.862369] ret_from_fork+0x47/0x70 [86.862383] ret_from_fork_asm+0x1a/0x30 [86.862399] -> gregkh#4 (reservation_ww_class_acquire){+.+.}-{0:0}: [86.862425] dma_resv_lockdep+0x178/0x390 [86.862440] do_one_initcall+0x60/0x3f0 [86.862454] kernel_init_freeable+0x3cd/0x680 [86.862470] kernel_init+0x1b/0x200 [86.862482] ret_from_fork+0x47/0x70 [86.862495] ret_from_fork_asm+0x1a/0x30 [86.862509] -> gregkh#3 (&mm->mmap_lock){++++}-{3:3}: [86.862531] down_read_killable+0x46/0x1e0 [86.862546] lock_mm_and_find_vma+0xa2/0x280 [86.862561] do_user_addr_fault+0x266/0x8e0 [86.862578] exc_page_fault+0x8a/0x2f0 [86.862593] asm_exc_page_fault+0x27/0x30 [86.862607] filldir64+0xeb/0x180 [86.862620] kernfs_fop_readdir+0x118/0x480 [86.862635] iterate_dir+0xcf/0x2b0 [86.862648] __x64_sys_getdents64+0x84/0x140 [86.862661] x64_sys_call+0x1058/0x2660 [86.862675] do_syscall_64+0x91/0xe90 [86.862689] entry_SYSCALL_64_after_hwframe+0x76/0x7e [86.862703] -> gregkh#2 (&root->kernfs_rwsem){++++}-{3:3}: [86.862725] down_write+0x3e/0xf0 [86.862738] kernfs_add_one+0x30/0x3c0 [86.862751] kernfs_create_dir_ns+0x53/0xb0 [86.862765] internal_create_group+0x134/0x4c0 [86.862779] sysfs_create_group+0x13/0x20 [86.862792] topology_add_dev+0x1d/0x30 [86.862806] cpuhp_invoke_callback+0x4b5/0x850 [86.862822] cpuhp_issue_call+0xbf/0x1f0 [86.862836] __cpuhp_setup_state_cpuslocked+0x111/0x320 [86.862852] __cpuhp_setup_state+0xb0/0x220 [86.862866] topology_sysfs_init+0x30/0x50 [86.862879] do_one_initcall+0x60/0x3f0 [86.862893] kernel_init_freeable+0x3cd/0x680 [86.862908] kernel_init+0x1b/0x200 [86.862921] ret_from_fork+0x47/0x70 [86.862934] ret_from_fork_asm+0x1a/0x30 [86.862947] -> gregkh#1 (cpuhp_state_mutex){+.+.}-{3:3}: [86.862969] __mutex_lock+0xaa/0xed0 [86.862982] mutex_lock_nested+0x1b/0x30 [86.862995] __cpuhp_setup_state_cpuslocked+0x67/0x320 [86.863012] __cpuhp_setup_state+0xb0/0x220 [86.863026] page_alloc_init_cpuhp+0x2d/0x60 [86.863041] mm_core_init+0x22/0x2d0 [86.863054] start_kernel+0x576/0xbd0 [86.863068] x86_64_start_reservations+0x18/0x30 [86.863084] x86_64_start_kernel+0xbf/0x110 [86.863098] common_startup_64+0x13e/0x141 [86.863114] -> #0 (cpu_hotplug_lock){++++}-{0:0}: [86.863135] __lock_acquire+0x1635/0x2810 [86.863152] lock_acquire+0xc4/0x2f0 [86.863166] cpus_read_lock+0x41/0x100 [86.863180] stop_machine+0x1c/0x50 [86.863194] bxt_vtd_ggtt_insert_entries__BKL+0x3b/0x60 [i915] [86.863987] intel_ggtt_bind_vma+0x43/0x70 [i915] [86.864735] __vma_bind+0x55/0x70 [i915] [86.865510] fence_work+0x26/0xa0 [i915] [86.866248] fence_notify+0xa1/0x140 [i915] [86.866983] __i915_sw_fence_complete+0x8f/0x270 [i915] [86.867719] i915_sw_fence_commit+0x39/0x60 [i915] [86.868453] i915_vma_pin_ww+0x462/0x1360 [i915] [86.869228] i915_vma_pin.constprop.0+0x133/0x1d0 [i915] [86.870001] initial_plane_vma+0x307/0x840 [i915] [86.870774] intel_initial_plane_config+0x33f/0x670 [i915] [86.871546] intel_display_driver_probe_nogem+0x1c6/0x260 [i915] [86.872330] i915_driver_probe+0x7fa/0xe80 [i915] [86.873057] i915_pci_probe+0xe6/0x220 [i915] [86.873782] local_pci_probe+0x47/0xb0 [86.873802] pci_device_probe+0xf3/0x260 [86.873817] really_probe+0xf1/0x3c0 [86.873833] __driver_probe_device+0x8c/0x180 [86.873848] driver_probe_device+0x24/0xd0 [86.873862] __driver_attach+0x10f/0x220 [86.873876] bus_for_each_dev+0x7f/0xe0 [86.873892] driver_attach+0x1e/0x30 [86.873904] bus_add_driver+0x151/0x290 [86.873917] driver_register+0x5e/0x130 [86.873931] __pci_register_driver+0x7d/0x90 [86.873945] i915_pci_register_driver+0x23/0x30 [i915] [86.874678] i915_init+0x37/0x120 [i915] [86.875347] do_one_initcall+0x60/0x3f0 [86.875369] do_init_module+0x97/0x2a0 [86.875385] load_module+0x2c54/0x2d80 [86.875398] init_module_from_file+0x96/0xe0 [86.875413] idempotent_init_module+0x117/0x330 [86.875426] __x64_sys_finit_module+0x77/0x100 [86.875440] x64_sys_call+0x24de/0x2660 [86.875454] do_syscall_64+0x91/0xe90 [86.875470] entry_SYSCALL_64_after_hwframe+0x76/0x7e [86.875486] other info that might help us debug this: [86.875502] Chain exists of: cpu_hotplug_lock --> reservation_ww_class_acquire --> reservation_ww_class_mutex [86.875539] Possible unsafe locking scenario: [86.875552] CPU0 CPU1 [86.875563] ---- ---- [86.875573] lock(reservation_ww_class_mutex); [86.875588] lock(reservation_ww_class_acquire); [86.875606] lock(reservation_ww_class_mutex); [86.875624] rlock(cpu_hotplug_lock); [86.875637] *** DEADLOCK *** [86.875650] 3 locks held by i915_module_loa/1432: [86.875663] #0: ffff888101f5c1b0 (&dev->mutex){....}-{3:3}, at: __driver_attach+0x104/0x220 [86.875699] gregkh#1: ffffc90002e0b4a0 (reservation_ww_class_acquire){+.+.}-{0:0}, at: i915_vma_pin.constprop.0+0x39/0x1d0 [i915] [86.876512] gregkh#2: ffffc90002e0b4c8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: i915_vma_pin.constprop.0+0x39/0x1d0 [i915] [86.877305] stack backtrace: [86.877326] CPU: 0 UID: 0 PID: 1432 Comm: i915_module_loa Tainted: G U 6.15.0-rc5-CI_DRM_16515-gca0305cadc2d+ gregkh#1 PREEMPT(voluntary) [86.877334] Tainted: [U]=USER [86.877336] Hardware name: /NUC5CPYB, BIOS PYBSWCEL.86A.0079.2020.0420.1316 04/20/2020 [86.877339] Call Trace: [86.877344] <TASK> [86.877353] dump_stack_lvl+0x91/0xf0 [86.877364] dump_stack+0x10/0x20 [86.877369] print_circular_bug+0x285/0x360 [86.877379] check_noncircular+0x135/0x150 [86.877390] __lock_acquire+0x1635/0x2810 [86.877403] lock_acquire+0xc4/0x2f0 [86.877408] ? stop_machine+0x1c/0x50 [86.877422] ? __pfx_bxt_vtd_ggtt_insert_entries__cb+0x10/0x10 [i915] [86.878173] cpus_read_lock+0x41/0x100 [86.878182] ? stop_machine+0x1c/0x50 [86.878191] ? __pfx_bxt_vtd_ggtt_insert_entries__cb+0x10/0x10 [i915] [86.878916] stop_machine+0x1c/0x50 [86.878927] bxt_vtd_ggtt_insert_entries__BKL+0x3b/0x60 [i915] [86.879652] intel_ggtt_bind_vma+0x43/0x70 [i915] [86.880375] __vma_bind+0x55/0x70 [i915] [86.881133] fence_work+0x26/0xa0 [i915] [86.881851] fence_notify+0xa1/0x140 [i915] [86.882566] __i915_sw_fence_complete+0x8f/0x270 [i915] [86.883286] i915_sw_fence_commit+0x39/0x60 [i915] [86.884003] i915_vma_pin_ww+0x462/0x1360 [i915] [86.884756] ? i915_vma_pin.constprop.0+0x6c/0x1d0 [i915] [86.885513] i915_vma_pin.constprop.0+0x133/0x1d0 [i915] [86.886281] initial_plane_vma+0x307/0x840 [i915] [86.887049] intel_initial_plane_config+0x33f/0x670 [i915] [86.887819] intel_display_driver_probe_nogem+0x1c6/0x260 [i915] [86.888587] i915_driver_probe+0x7fa/0xe80 [i915] [86.889293] ? mutex_unlock+0x12/0x20 [86.889301] ? drm_privacy_screen_get+0x171/0x190 [86.889308] ? acpi_dev_found+0x66/0x80 [86.889321] i915_pci_probe+0xe6/0x220 [i915] [86.890038] local_pci_probe+0x47/0xb0 [86.890049] pci_device_probe+0xf3/0x260 [86.890058] really_probe+0xf1/0x3c0 [86.890067] __driver_probe_device+0x8c/0x180 [86.890072] driver_probe_device+0x24/0xd0 [86.890078] __driver_attach+0x10f/0x220 [86.890083] ? __pfx___driver_attach+0x10/0x10 [86.890088] bus_for_each_dev+0x7f/0xe0 [86.890097] driver_attach+0x1e/0x30 [86.890101] bus_add_driver+0x151/0x290 [86.890107] driver_register+0x5e/0x130 [86.890113] __pci_register_driver+0x7d/0x90 [86.890119] i915_pci_register_driver+0x23/0x30 [i915] [86.890833] i915_init+0x37/0x120 [i915] [86.891482] ? __pfx_i915_init+0x10/0x10 [i915] [86.892135] do_one_initcall+0x60/0x3f0 [86.892145] ? __kmalloc_cache_noprof+0x33f/0x470 [86.892157] do_init_module+0x97/0x2a0 [86.892164] load_module+0x2c54/0x2d80 [86.892168] ? __kernel_read+0x15c/0x300 [86.892185] ? kernel_read_file+0x2b1/0x320 [86.892195] init_module_from_file+0x96/0xe0 [86.892199] ? init_module_from_file+0x96/0xe0 [86.892211] idempotent_init_module+0x117/0x330 [86.892224] __x64_sys_finit_module+0x77/0x100 [86.892230] x64_sys_call+0x24de/0x2660 [86.892236] do_syscall_64+0x91/0xe90 [86.892243] ? irqentry_exit+0x77/0xb0 [86.892249] ? sysvec_apic_timer_interrupt+0x57/0xc0 [86.892256] entry_SYSCALL_64_after_hwframe+0x76/0x7e [86.892261] RIP: 0033:0x7303e1b2725d [86.892271] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8b bb 0d 00 f7 d8 64 89 01 48 [86.892276] RSP: 002b:00007ffddd1fdb38 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [86.892281] RAX: ffffffffffffffda RBX: 00005d771d88fd90 RCX: 00007303e1b2725d [86.892285] RDX: 0000000000000000 RSI: 00005d771d893aa0 RDI: 000000000000000c [86.892287] RBP: 00007ffddd1fdbf0 R08: 0000000000000040 R09: 00007ffddd1fdb80 [86.892289] R10: 00007303e1c03b20 R11: 0000000000000246 R12: 00005d771d893aa0 [86.892292] R13: 0000000000000000 R14: 00005d771d88f0d0 R15: 00005d771d895710 [86.892304] </TASK> Call asynchronous variant of dma_fence_work_commit() in that case. v3: Provide more verbose in-line comment (Andi), - mention target environments in commit message. Fixes: 7d1c261 ("drm/i915: Take reservation lock around i915_vma_pin.") Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/14985 Cc: Andi Shyti <[email protected]> Signed-off-by: Janusz Krzysztofik <[email protected]> Reviewed-by: Sebastian Brzezinka <[email protected]> Reviewed-by: Krzysztof Karas <[email protected]> Acked-by: Andi Shyti <[email protected]> Signed-off-by: Andi Shyti <[email protected]> Link: https://lore.kernel.org/r/[email protected] (cherry picked from commit 648ef13) Signed-off-by: Rodrigo Vivi <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

commit 2d8ab77 upstream. The U-Blox EVK-M101 enumerates as 1546:0506 [1] with four FTDI interfaces: - EVK-M101 current sensors - EVK-M101 I2C - EVK-M101 UART - EVK-M101 port D Only the third USB interface is a UART. This change lets ftdi_sio probe the VID/PID and registers only interface gregkh#3 as a TTY, leaving the rest available for other drivers. [1] usb 5-1.3: new high-speed USB device number 11 using xhci_hcd usb 5-1.3: New USB device found, idVendor=1546, idProduct=0506, bcdDevice= 8.00 usb 5-1.3: New USB device strings: Mfr=1, Product=2, SerialNumber=0 usb 5-1.3: Product: EVK-M101 usb 5-1.3: Manufacturer: u-blox AG Datasheet: https://content.u-blox.com/sites/default/files/documents/EVK-M10_UserGuide_UBX-21003949.pdf Signed-off-by: Oleksandr Suvorov <[email protected]> Link: https://lore.kernel.org/[email protected]/ Cc: [email protected] Signed-off-by: Johan Hovold <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

… 'T' When perf report with annotation for a symbol, press 's' and 'T', then exit the annotate browser. Once annotate the same symbol, the annotate browser will crash. The browser.arch was required to be correctly updated when data type feature was enabled by 'T'. Usually it was initialized by symbol__annotate2 function. If a symbol has already been correctly annotated at the first time, it should not call the symbol__annotate2 function again, thus the browser.arch will not get initialized. Then at the second time to show the annotate browser, the data type needs to be displayed but the browser.arch is empty. Stack trace as below: Perf: Segmentation fault -------- backtrace -------- #0 0x55d365 in ui__signal_backtrace setup.c:0 gregkh#1 0x7f5ff1a3e930 in __restore_rt libc.so.6[3e930] gregkh#2 0x570f08 in arch__is perf[570f08] gregkh#3 0x562186 in annotate_get_insn_location perf[562186] gregkh#4 0x562626 in __hist_entry__get_data_type annotate.c:0 gregkh#5 0x56476d in annotation_line__write perf[56476d] gregkh#6 0x54e2db in annotate_browser__write annotate.c:0 gregkh#7 0x54d061 in ui_browser__list_head_refresh perf[54d061] gregkh#8 0x54dc9e in annotate_browser__refresh annotate.c:0 gregkh#9 0x54c03d in __ui_browser__refresh browser.c:0 gregkh#10 0x54ccf8 in ui_browser__run perf[54ccf8] gregkh#11 0x54eb92 in __hist_entry__tui_annotate perf[54eb92] gregkh#12 0x552293 in do_annotate hists.c:0 gregkh#13 0x55941c in evsel__hists_browse hists.c:0 gregkh#14 0x55b00f in evlist__tui_browse_hists perf[55b00f] gregkh#15 0x42ff02 in cmd_report perf[42ff02] gregkh#16 0x494008 in run_builtin perf.c:0 gregkh#17 0x494305 in handle_internal_command perf.c:0 gregkh#18 0x410547 in main perf[410547] gregkh#19 0x7f5ff1a295d0 in __libc_start_call_main libc.so.6[295d0] gregkh#20 0x7f5ff1a29680 in __libc_start_main@@GLIBC_2.34 libc.so.6[29680] gregkh#21 0x410b75 in _start perf[410b75] Fixes: 1d4374a ("perf annotate: Add 'T' hot key to toggle data type display") Reviewed-by: James Clark <[email protected]> Tested-by: Namhyung Kim <[email protected]> Signed-off-by: Tianyou Li <[email protected]> Signed-off-by: Namhyung Kim <[email protected]>

When using perf record with the `--overwrite` option, a segmentation fault occurs if an event fails to open. For example: perf record -e cycles-ct -F 1000 -a --overwrite Error: cycles-ct:H: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat' perf: Segmentation fault #0 0x6466b6 in dump_stack debug.c:366 gregkh#1 0x646729 in sighandler_dump_stack debug.c:378 gregkh#2 0x453fd1 in sigsegv_handler builtin-record.c:722 gregkh#3 0x7f8454e65090 in __restore_rt libc-2.32.so[54090] gregkh#4 0x6c5671 in __perf_event__synthesize_id_index synthetic-events.c:1862 gregkh#5 0x6c5ac0 in perf_event__synthesize_id_index synthetic-events.c:1943 gregkh#6 0x458090 in record__synthesize builtin-record.c:2075 gregkh#7 0x45a85a in __cmd_record builtin-record.c:2888 gregkh#8 0x45deb6 in cmd_record builtin-record.c:4374 gregkh#9 0x4e5e33 in run_builtin perf.c:349 gregkh#10 0x4e60bf in handle_internal_command perf.c:401 gregkh#11 0x4e6215 in run_argv perf.c:448 gregkh#12 0x4e653a in main perf.c:555 gregkh#13 0x7f8454e4fa72 in __libc_start_main libc-2.32.so[3ea72] gregkh#14 0x43a3ee in _start ??:0 The --overwrite option implies --tail-synthesize, which collects non-sample events reflecting the system status when recording finishes. However, when evsel opening fails (e.g., unsupported event 'cycles-ct'), session->evlist is not initialized and remains NULL. The code unconditionally calls record__synthesize() in the error path, which iterates through the NULL evlist pointer and causes a segfault. To fix it, move the record__synthesize() call inside the error check block, so it's only called when there was no error during recording, ensuring that evlist is properly initialized. Fixes: 4ea648a ("perf record: Add --tail-synthesize option") Signed-off-by: Shuai Xue <[email protected]> Signed-off-by: Namhyung Kim <[email protected]>

When interrupting perf stat in repeat mode with a signal the signal is passed to the child process but the repeat doesn't terminate: ``` $ perf stat -v --null --repeat 10 sleep 1 Control descriptor is not initialized [ perf stat: executing run gregkh#1 ... ] [ perf stat: executing run gregkh#2 ... ] ^Csleep: Interrupt [ perf stat: executing run gregkh#3 ... ] [ perf stat: executing run gregkh#4 ... ] [ perf stat: executing run gregkh#5 ... ] [ perf stat: executing run gregkh#6 ... ] [ perf stat: executing run gregkh#7 ... ] [ perf stat: executing run gregkh#8 ... ] [ perf stat: executing run gregkh#9 ... ] [ perf stat: executing run gregkh#10 ... ] Performance counter stats for 'sleep 1' (10 runs): 0.9500 +- 0.0512 seconds time elapsed ( +- 5.39% ) 0.01user 0.02system 0:09.53elapsed 0%CPU (0avgtext+0avgdata 18940maxresident)k 29944inputs+0outputs (0major+2629minor)pagefaults 0swaps ``` Terminate the repeated run and give a reasonable exit value: ``` $ perf stat -v --null --repeat 10 sleep 1 Control descriptor is not initialized [ perf stat: executing run gregkh#1 ... ] [ perf stat: executing run gregkh#2 ... ] [ perf stat: executing run gregkh#3 ... ] ^Csleep: Interrupt Performance counter stats for 'sleep 1' (10 runs): 0.680 +- 0.321 seconds time elapsed ( +- 47.16% ) Command exited with non-zero status 130 0.00user 0.01system 0:02.05elapsed 0%CPU (0avgtext+0avgdata 70688maxresident)k 0inputs+0outputs (0major+5002minor)pagefaults 0swaps ``` Note, this also changes the exit value for non-repeat runs when interrupted by a signal. Reported-by: Ingo Molnar <[email protected]> Closes: https://lore.kernel.org/lkml/[email protected]/ Signed-off-by: Ian Rogers <[email protected]> Tested-by: Thomas Richter <[email protected]> Signed-off-by: Namhyung Kim <[email protected]>

[ Upstream commit 163e5f2 ] When using perf record with the `--overwrite` option, a segmentation fault occurs if an event fails to open. For example: perf record -e cycles-ct -F 1000 -a --overwrite Error: cycles-ct:H: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat' perf: Segmentation fault #0 0x6466b6 in dump_stack debug.c:366 gregkh#1 0x646729 in sighandler_dump_stack debug.c:378 gregkh#2 0x453fd1 in sigsegv_handler builtin-record.c:722 gregkh#3 0x7f8454e65090 in __restore_rt libc-2.32.so[54090] gregkh#4 0x6c5671 in __perf_event__synthesize_id_index synthetic-events.c:1862 gregkh#5 0x6c5ac0 in perf_event__synthesize_id_index synthetic-events.c:1943 gregkh#6 0x458090 in record__synthesize builtin-record.c:2075 gregkh#7 0x45a85a in __cmd_record builtin-record.c:2888 gregkh#8 0x45deb6 in cmd_record builtin-record.c:4374 gregkh#9 0x4e5e33 in run_builtin perf.c:349 gregkh#10 0x4e60bf in handle_internal_command perf.c:401 gregkh#11 0x4e6215 in run_argv perf.c:448 gregkh#12 0x4e653a in main perf.c:555 gregkh#13 0x7f8454e4fa72 in __libc_start_main libc-2.32.so[3ea72] gregkh#14 0x43a3ee in _start ??:0 The --overwrite option implies --tail-synthesize, which collects non-sample events reflecting the system status when recording finishes. However, when evsel opening fails (e.g., unsupported event 'cycles-ct'), session->evlist is not initialized and remains NULL. The code unconditionally calls record__synthesize() in the error path, which iterates through the NULL evlist pointer and causes a segfault. To fix it, move the record__synthesize() call inside the error check block, so it's only called when there was no error during recording, ensuring that evlist is properly initialized. Fixes: 4ea648a ("perf record: Add --tail-synthesize option") Signed-off-by: Shuai Xue <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

[ Upstream commit 1e4b207 ] The following warning appears when running syzkaller, and this issue also exists in the mainline code. ------------[ cut here ]------------ list_add double add: new=ffffffffa57eee28, prev=ffffffffa57eee28, next=ffffffffa5e63100. WARNING: CPU: 0 PID: 1491 at lib/list_debug.c:35 __list_add_valid_or_report+0xf7/0x130 Modules linked in: CPU: 0 PID: 1491 Comm: syz.1.28 Not tainted 6.6.0+ gregkh#3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:__list_add_valid_or_report+0xf7/0x130 RSP: 0018:ff1100010dfb7b78 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffffffffa57eee18 RCX: ffffffff97fc9817 RDX: 0000000000040000 RSI: ffa0000002383000 RDI: 0000000000000001 RBP: ffffffffa57eee28 R08: 0000000000000001 R09: ffe21c0021bf6f2c R10: 0000000000000001 R11: 6464615f7473696c R12: ffffffffa5e63100 R13: ffffffffa57eee28 R14: ffffffffa57eee28 R15: ff1100010dfb7d48 FS: 00007fb14398b640(0000) GS:ff11000119600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000010d096005 CR4: 0000000000773ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 80000000 Call Trace: <TASK> input_register_handler+0xb3/0x210 mac_hid_start_emulation+0x1c5/0x290 mac_hid_toggle_emumouse+0x20a/0x240 proc_sys_call_handler+0x4c2/0x6e0 new_sync_write+0x1b1/0x2d0 vfs_write+0x709/0x950 ksys_write+0x12a/0x250 do_syscall_64+0x5a/0x110 entry_SYSCALL_64_after_hwframe+0x78/0xe2 The WARNING occurs when two processes concurrently write to the mac-hid emulation sysctl, causing a race condition in mac_hid_toggle_emumouse(). Both processes read old_val=0, then both try to register the input handler, leading to a double list_add of the same handler. CPU0 CPU1 ------------------------- ------------------------- vfs_write() //write 1 vfs_write() //write 1 proc_sys_write() proc_sys_write() mac_hid_toggle_emumouse() mac_hid_toggle_emumouse() old_val = *valp // old_val=0 old_val = *valp // old_val=0 mutex_lock_killable() proc_dointvec() // *valp=1 mac_hid_start_emulation() input_register_handler() mutex_unlock() mutex_lock_killable() proc_dointvec() mac_hid_start_emulation() input_register_handler() //Trigger Warning mutex_unlock() Fix this by moving the old_val read inside the mutex lock region. Fixes: 99b089c ("Input: Mac button emulation - implement as an input filter") Signed-off-by: Long Li <[email protected]> Signed-off-by: Madhavan Srinivasan <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Sasha Levin <[email protected]>

[ Upstream commit a907f3a ] 1. fadvise(fd1, POSIX_FADV_NOREUSE, {0,3}); 2. fadvise(fd2, POSIX_FADV_NOREUSE, {1,2}); 3. fadvise(fd3, POSIX_FADV_NOREUSE, {3,1}); 4. echo 1024 > /sys/fs/f2fs/tuning/reclaim_caches_kb This gives a way to reclaim file-backed pages by iterating all f2fs mounts until reclaiming 1MB page cache ranges, registered by gregkh#1, gregkh#2, and gregkh#3. 5. cat /sys/fs/f2fs/tuning/reclaim_caches_kb -> gives total number of registered file ranges. Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]> Stable-dep-of: e462fc4 ("f2fs: maintain one time GC mode is enabled during whole zoned GC cycle") Signed-off-by: Sasha Levin <[email protected]>

[ Upstream commit 163e5f2 ] When using perf record with the `--overwrite` option, a segmentation fault occurs if an event fails to open. For example: perf record -e cycles-ct -F 1000 -a --overwrite Error: cycles-ct:H: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat' perf: Segmentation fault #0 0x6466b6 in dump_stack debug.c:366 #1 0x646729 in sighandler_dump_stack debug.c:378 #2 0x453fd1 in sigsegv_handler builtin-record.c:722 #3 0x7f8454e65090 in __restore_rt libc-2.32.so[54090] #4 0x6c5671 in __perf_event__synthesize_id_index synthetic-events.c:1862 #5 0x6c5ac0 in perf_event__synthesize_id_index synthetic-events.c:1943 #6 0x458090 in record__synthesize builtin-record.c:2075 #7 0x45a85a in __cmd_record builtin-record.c:2888 #8 0x45deb6 in cmd_record builtin-record.c:4374 #9 0x4e5e33 in run_builtin perf.c:349 #10 0x4e60bf in handle_internal_command perf.c:401 #11 0x4e6215 in run_argv perf.c:448 #12 0x4e653a in main perf.c:555 #13 0x7f8454e4fa72 in __libc_start_main libc-2.32.so[3ea72] #14 0x43a3ee in _start ??:0 The --overwrite option implies --tail-synthesize, which collects non-sample events reflecting the system status when recording finishes. However, when evsel opening fails (e.g., unsupported event 'cycles-ct'), session->evlist is not initialized and remains NULL. The code unconditionally calls record__synthesize() in the error path, which iterates through the NULL evlist pointer and causes a segfault. To fix it, move the record__synthesize() call inside the error check block, so it's only called when there was no error during recording, ensuring that evlist is properly initialized. Fixes: 4ea648a ("perf record: Add --tail-synthesize option") Signed-off-by: Shuai Xue <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

[ Upstream commit 1e4b207 ] The following warning appears when running syzkaller, and this issue also exists in the mainline code. ------------[ cut here ]------------ list_add double add: new=ffffffffa57eee28, prev=ffffffffa57eee28, next=ffffffffa5e63100. WARNING: CPU: 0 PID: 1491 at lib/list_debug.c:35 __list_add_valid_or_report+0xf7/0x130 Modules linked in: CPU: 0 PID: 1491 Comm: syz.1.28 Not tainted 6.6.0+ #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:__list_add_valid_or_report+0xf7/0x130 RSP: 0018:ff1100010dfb7b78 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffffffffa57eee18 RCX: ffffffff97fc9817 RDX: 0000000000040000 RSI: ffa0000002383000 RDI: 0000000000000001 RBP: ffffffffa57eee28 R08: 0000000000000001 R09: ffe21c0021bf6f2c R10: 0000000000000001 R11: 6464615f7473696c R12: ffffffffa5e63100 R13: ffffffffa57eee28 R14: ffffffffa57eee28 R15: ff1100010dfb7d48 FS: 00007fb14398b640(0000) GS:ff11000119600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000010d096005 CR4: 0000000000773ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 80000000 Call Trace: <TASK> input_register_handler+0xb3/0x210 mac_hid_start_emulation+0x1c5/0x290 mac_hid_toggle_emumouse+0x20a/0x240 proc_sys_call_handler+0x4c2/0x6e0 new_sync_write+0x1b1/0x2d0 vfs_write+0x709/0x950 ksys_write+0x12a/0x250 do_syscall_64+0x5a/0x110 entry_SYSCALL_64_after_hwframe+0x78/0xe2 The WARNING occurs when two processes concurrently write to the mac-hid emulation sysctl, causing a race condition in mac_hid_toggle_emumouse(). Both processes read old_val=0, then both try to register the input handler, leading to a double list_add of the same handler. CPU0 CPU1 ------------------------- ------------------------- vfs_write() //write 1 vfs_write() //write 1 proc_sys_write() proc_sys_write() mac_hid_toggle_emumouse() mac_hid_toggle_emumouse() old_val = *valp // old_val=0 old_val = *valp // old_val=0 mutex_lock_killable() proc_dointvec() // *valp=1 mac_hid_start_emulation() input_register_handler() mutex_unlock() mutex_lock_killable() proc_dointvec() mac_hid_start_emulation() input_register_handler() //Trigger Warning mutex_unlock() Fix this by moving the old_val read inside the mutex lock region. Fixes: 99b089c ("Input: Mac button emulation - implement as an input filter") Signed-off-by: Long Li <[email protected]> Signed-off-by: Madhavan Srinivasan <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Sasha Levin <[email protected]>

[ Upstream commit 163e5f2 ] When using perf record with the `--overwrite` option, a segmentation fault occurs if an event fails to open. For example: perf record -e cycles-ct -F 1000 -a --overwrite Error: cycles-ct:H: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat' perf: Segmentation fault #0 0x6466b6 in dump_stack debug.c:366 #1 0x646729 in sighandler_dump_stack debug.c:378 #2 0x453fd1 in sigsegv_handler builtin-record.c:722 #3 0x7f8454e65090 in __restore_rt libc-2.32.so[54090] #4 0x6c5671 in __perf_event__synthesize_id_index synthetic-events.c:1862 #5 0x6c5ac0 in perf_event__synthesize_id_index synthetic-events.c:1943 #6 0x458090 in record__synthesize builtin-record.c:2075 #7 0x45a85a in __cmd_record builtin-record.c:2888 #8 0x45deb6 in cmd_record builtin-record.c:4374 #9 0x4e5e33 in run_builtin perf.c:349 #10 0x4e60bf in handle_internal_command perf.c:401 #11 0x4e6215 in run_argv perf.c:448 #12 0x4e653a in main perf.c:555 #13 0x7f8454e4fa72 in __libc_start_main libc-2.32.so[3ea72] #14 0x43a3ee in _start ??:0 The --overwrite option implies --tail-synthesize, which collects non-sample events reflecting the system status when recording finishes. However, when evsel opening fails (e.g., unsupported event 'cycles-ct'), session->evlist is not initialized and remains NULL. The code unconditionally calls record__synthesize() in the error path, which iterates through the NULL evlist pointer and causes a segfault. To fix it, move the record__synthesize() call inside the error check block, so it's only called when there was no error during recording, ensuring that evlist is properly initialized. Fixes: 4ea648a ("perf record: Add --tail-synthesize option") Signed-off-by: Shuai Xue <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

[ Upstream commit 1e4b207 ] The following warning appears when running syzkaller, and this issue also exists in the mainline code. ------------[ cut here ]------------ list_add double add: new=ffffffffa57eee28, prev=ffffffffa57eee28, next=ffffffffa5e63100. WARNING: CPU: 0 PID: 1491 at lib/list_debug.c:35 __list_add_valid_or_report+0xf7/0x130 Modules linked in: CPU: 0 PID: 1491 Comm: syz.1.28 Not tainted 6.6.0+ #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:__list_add_valid_or_report+0xf7/0x130 RSP: 0018:ff1100010dfb7b78 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffffffffa57eee18 RCX: ffffffff97fc9817 RDX: 0000000000040000 RSI: ffa0000002383000 RDI: 0000000000000001 RBP: ffffffffa57eee28 R08: 0000000000000001 R09: ffe21c0021bf6f2c R10: 0000000000000001 R11: 6464615f7473696c R12: ffffffffa5e63100 R13: ffffffffa57eee28 R14: ffffffffa57eee28 R15: ff1100010dfb7d48 FS: 00007fb14398b640(0000) GS:ff11000119600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000010d096005 CR4: 0000000000773ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 80000000 Call Trace: <TASK> input_register_handler+0xb3/0x210 mac_hid_start_emulation+0x1c5/0x290 mac_hid_toggle_emumouse+0x20a/0x240 proc_sys_call_handler+0x4c2/0x6e0 new_sync_write+0x1b1/0x2d0 vfs_write+0x709/0x950 ksys_write+0x12a/0x250 do_syscall_64+0x5a/0x110 entry_SYSCALL_64_after_hwframe+0x78/0xe2 The WARNING occurs when two processes concurrently write to the mac-hid emulation sysctl, causing a race condition in mac_hid_toggle_emumouse(). Both processes read old_val=0, then both try to register the input handler, leading to a double list_add of the same handler. CPU0 CPU1 ------------------------- ------------------------- vfs_write() //write 1 vfs_write() //write 1 proc_sys_write() proc_sys_write() mac_hid_toggle_emumouse() mac_hid_toggle_emumouse() old_val = *valp // old_val=0 old_val = *valp // old_val=0 mutex_lock_killable() proc_dointvec() // *valp=1 mac_hid_start_emulation() input_register_handler() mutex_unlock() mutex_lock_killable() proc_dointvec() mac_hid_start_emulation() input_register_handler() //Trigger Warning mutex_unlock() Fix this by moving the old_val read inside the mutex lock region. Fixes: 99b089c ("Input: Mac button emulation - implement as an input filter") Signed-off-by: Long Li <[email protected]> Signed-off-by: Madhavan Srinivasan <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Sasha Levin <[email protected]>

We sometimes observe use-after-free when dereferencing a neighbour [1]. The problem seems to be that the driver stores a pointer to the neighbour, but without holding a reference on it. A reference is only taken when the neighbour is used by a nexthop. Fix by simplifying the reference counting scheme. Always take a reference when storing a neighbour pointer in a neighbour entry. Avoid taking a referencing when the neighbour is used by a nexthop as the neighbour entry associated with the nexthop already holds a reference. Tested by running the test that uncovered the problem over 300 times. Without this patch the problem was reproduced after a handful of iterations. [1] BUG: KASAN: slab-use-after-free in mlxsw_sp_neigh_entry_update+0x2d4/0x310 Read of size 8 at addr ffff88817f8e3420 by task ip/3929 CPU: 3 UID: 0 PID: 3929 Comm: ip Not tainted 6.18.0-rc4-virtme-g36b21a067510 gregkh#3 PREEMPT(full) Hardware name: Nvidia SN5600/VMOD0013, BIOS 5.13 05/31/2023 Call Trace: <TASK> dump_stack_lvl+0x6f/0xa0 print_address_description.constprop.0+0x6e/0x300 print_report+0xfc/0x1fb kasan_report+0xe4/0x110 mlxsw_sp_neigh_entry_update+0x2d4/0x310 mlxsw_sp_router_rif_gone_sync+0x35f/0x510 mlxsw_sp_rif_destroy+0x1ea/0x730 mlxsw_sp_inetaddr_port_vlan_event+0xa1/0x1b0 __mlxsw_sp_inetaddr_lag_event+0xcc/0x130 __mlxsw_sp_inetaddr_event+0xf5/0x3c0 mlxsw_sp_router_netdevice_event+0x1015/0x1580 notifier_call_chain+0xcc/0x150 call_netdevice_notifiers_info+0x7e/0x100 __netdev_upper_dev_unlink+0x10b/0x210 netdev_upper_dev_unlink+0x79/0xa0 vrf_del_slave+0x18/0x50 do_set_master+0x146/0x7d0 do_setlink.isra.0+0x9a0/0x2880 rtnl_newlink+0x637/0xb20 rtnetlink_rcv_msg+0x6fe/0xb90 netlink_rcv_skb+0x123/0x380 netlink_unicast+0x4a3/0x770 netlink_sendmsg+0x75b/0xc90 __sock_sendmsg+0xbe/0x160 ____sys_sendmsg+0x5b2/0x7d0 ___sys_sendmsg+0xfd/0x180 __sys_sendmsg+0x124/0x1c0 do_syscall_64+0xbb/0xfd0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 [...] Allocated by task 109: kasan_save_stack+0x30/0x50 kasan_save_track+0x14/0x30 __kasan_kmalloc+0x7b/0x90 __kmalloc_noprof+0x2c1/0x790 neigh_alloc+0x6af/0x8f0 ___neigh_create+0x63/0xe90 mlxsw_sp_nexthop_neigh_init+0x430/0x7e0 mlxsw_sp_nexthop_type_init+0x212/0x960 mlxsw_sp_nexthop6_group_info_init.constprop.0+0x81f/0x1280 mlxsw_sp_nexthop6_group_get+0x392/0x6a0 mlxsw_sp_fib6_entry_create+0x46a/0xfd0 mlxsw_sp_router_fib6_replace+0x1ed/0x5f0 mlxsw_sp_router_fib6_event_work+0x10a/0x2a0 process_one_work+0xd57/0x1390 worker_thread+0x4d6/0xd40 kthread+0x355/0x5b0 ret_from_fork+0x1d4/0x270 ret_from_fork_asm+0x11/0x20 Freed by task 154: kasan_save_stack+0x30/0x50 kasan_save_track+0x14/0x30 __kasan_save_free_info+0x3b/0x60 __kasan_slab_free+0x43/0x70 kmem_cache_free_bulk.part.0+0x1eb/0x5e0 kvfree_rcu_bulk+0x1f2/0x260 kfree_rcu_work+0x130/0x1b0 process_one_work+0xd57/0x1390 worker_thread+0x4d6/0xd40 kthread+0x355/0x5b0 ret_from_fork+0x1d4/0x270 ret_from_fork_asm+0x11/0x20 Last potentially related work creation: kasan_save_stack+0x30/0x50 kasan_record_aux_stack+0x8c/0xa0 kvfree_call_rcu+0x93/0x5b0 mlxsw_sp_router_neigh_event_work+0x67d/0x860 process_one_work+0xd57/0x1390 worker_thread+0x4d6/0xd40 kthread+0x355/0x5b0 ret_from_fork+0x1d4/0x270 ret_from_fork_asm+0x11/0x20 Fixes: 6cf3c97 ("mlxsw: spectrum_router: Add private neigh table") Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: Petr Machata <[email protected]> Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://patch.msgid.link/92d75e21d95d163a41b5cea67a15cd33f547cba6.1764695650.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <[email protected]>

Petr Machata says: ==================== selftests: forwarding: vxlan_bridge_1q_mc_ul: Fix flakiness The net/forwarding/vxlan_bridge_1q_mc_ul selftest runs an overlay traffic, forwarded over a multicast-routed VXLAN underlay. In order to determine whether packets reach their intended destination, it uses a TC match. For convenience, it uses a flower match, which however does not allow matching on the encapsulated packet. So various service traffic ends up being indistinguishable from the test packets, and ends up confusing the test. To alleviate the problem, the test uses sleep to allow the necessary service traffic to run and clear the channel, before running the test traffic. This worked for a while, but lately we have nevertheless seen flakiness of the test in the CI. In this patchset, first generalize tc_rule_stats_get() to support u32 in patch gregkh#1, then in patch gregkh#2 convert the test to use u32 to allow parsing deeper into the packet, and in gregkh#3 drop the now-unnecessary sleep. ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

Fix a loop scenario of ethx:egress->ethx:egress Example setup to reproduce: tc qdisc add dev ethx root handle 1: drr tc filter add dev ethx parent 1: protocol ip prio 1 matchall \ action mirred egress redirect dev ethx Now ping out of ethx and you get a deadlock: [ 116.892898][ T307] ============================================ [ 116.893182][ T307] WARNING: possible recursive locking detected [ 116.893418][ T307] 6.18.0-rc6-01205-ge05021a829b8-dirty #204 Not tainted [ 116.893682][ T307] -------------------------------------------- [ 116.893926][ T307] ping/307 is trying to acquire lock: [ 116.894133][ T307] ffff88800c122908 (&sch->root_lock_key){+...}-{3:3}, at: __dev_queue_xmit+0x2210/0x3b50 [ 116.894517][ T307] [ 116.894517][ T307] but task is already holding lock: [ 116.894836][ T307] ffff88800c122908 (&sch->root_lock_key){+...}-{3:3}, at: __dev_queue_xmit+0x2210/0x3b50 [ 116.895252][ T307] [ 116.895252][ T307] other info that might help us debug this: [ 116.895608][ T307] Possible unsafe locking scenario: [ 116.895608][ T307] [ 116.895901][ T307] CPU0 [ 116.896057][ T307] ---- [ 116.896200][ T307] lock(&sch->root_lock_key); [ 116.896392][ T307] lock(&sch->root_lock_key); [ 116.896605][ T307] [ 116.896605][ T307] *** DEADLOCK *** [ 116.896605][ T307] [ 116.896864][ T307] May be due to missing lock nesting notation [ 116.896864][ T307] [ 116.897123][ T307] 6 locks held by ping/307: [ 116.897302][ T307] #0: ffff88800b4b0250 (sk_lock-AF_INET){+.+.}-{0:0}, at: raw_sendmsg+0xb20/0x2cf0 [ 116.897808][ T307] gregkh#1: ffffffff88c839c0 (rcu_read_lock){....}-{1:3}, at: ip_output+0xa9/0x600 [ 116.898138][ T307] gregkh#2: ffffffff88c839c0 (rcu_read_lock){....}-{1:3}, at: ip_finish_output2+0x2c6/0x1ee0 [ 116.898459][ T307] gregkh#3: ffffffff88c83960 (rcu_read_lock_bh){....}-{1:3}, at: __dev_queue_xmit+0x200/0x3b50 [ 116.898782][ T307] gregkh#4: ffff88800c122908 (&sch->root_lock_key){+...}-{3:3}, at: __dev_queue_xmit+0x2210/0x3b50 [ 116.899132][ T307] gregkh#5: ffffffff88c83960 (rcu_read_lock_bh){....}-{1:3}, at: __dev_queue_xmit+0x200/0x3b50 [ 116.899442][ T307] [ 116.899442][ T307] stack backtrace: [ 116.899667][ T307] CPU: 2 UID: 0 PID: 307 Comm: ping Not tainted 6.18.0-rc6-01205-ge05021a829b8-dirty #204 PREEMPT(voluntary) [ 116.899672][ T307] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 116.899675][ T307] Call Trace: [ 116.899678][ T307] <TASK> [ 116.899680][ T307] dump_stack_lvl+0x6f/0xb0 [ 116.899688][ T307] print_deadlock_bug.cold+0xc0/0xdc [ 116.899695][ T307] __lock_acquire+0x11f7/0x1be0 [ 116.899704][ T307] lock_acquire+0x162/0x300 [ 116.899707][ T307] ? __dev_queue_xmit+0x2210/0x3b50 [ 116.899713][ T307] ? srso_alias_return_thunk+0x5/0xfbef5 [ 116.899717][ T307] ? stack_trace_save+0x93/0xd0 [ 116.899723][ T307] _raw_spin_lock+0x30/0x40 [ 116.899728][ T307] ? __dev_queue_xmit+0x2210/0x3b50 [ 116.899731][ T307] __dev_queue_xmit+0x2210/0x3b50 Fixes: 178ca30 ("Revert "net/sched: Fix mirred deadlock on device recursion"") Tested-by: Victor Nogueira <[email protected]> Signed-off-by: Jamal Hadi Salim <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]>

Relicense as SSPL

3941433

This is crucial to be able to host the brand new doubled-down source-open Elasticsearch on Linux. Signed-off-by: Stuart Reinhild

gregkh closed this Jan 28, 2021

gregkh reviewed Jan 28, 2021

View reviewed changes

anchalag referenced this pull request in amazonlinux/linux Feb 7, 2021

Merge tag 'kvmarm-fixes-5.11-3' of git://git.kernel.org/pub/scm/linux…

074489b

…/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 5.11, take #3 - Avoid clobbering extra registers on initialisation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Relicense as SSPL #3

Relicense as SSPL #3

Uh oh!

stuardreinhild commented Jan 28, 2021

Uh oh!

gregkh left a comment

Uh oh!

stuardreinhild commented Jan 28, 2021

Uh oh!

gregkh commented Jan 28, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Relicense as SSPL #3

Relicense as SSPL #3

Uh oh!

Conversation

stuardreinhild commented Jan 28, 2021

Uh oh!

gregkh left a comment

Choose a reason for hiding this comment

Uh oh!

stuardreinhild commented Jan 28, 2021

Uh oh!

gregkh commented Jan 28, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants