Merge latest 4.9.y stable patches (4.9.43) by gibsson · Pull Request #11 · Freescale/linux-fslc

gibsson · 2017-08-15T19:10:20Z

No description provided.

commit 340d394 upstream. The driver checks port->exists twice in i8042_interrupt(), first when trying to assign temporary "serio" variable, and second time when deciding whether it should call serio_interrupt(). The value of port->exists may change between the 2 checks, and we may end up calling serio_interrupt() with a NULL pointer: BUG: unable to handle kernel NULL pointer dereference at 0000000000000050 IP: [<ffffffff8150feaf>] _spin_lock_irqsave+0x1f/0x40 PGD 0 Oops: 0002 [Freescale#1] SMP last sysfs file: CPU 0 Modules linked in: Pid: 1, comm: swapper Not tainted 2.6.32-358.el6.x86_64 Freescale#1 QEMU Standard PC (i440FX + PIIX, 1996) RIP: 0010:[<ffffffff8150feaf>] [<ffffffff8150feaf>] _spin_lock_irqsave+0x1f/0x40 RSP: 0018:ffff880028203cc0 EFLAGS: 00010082 RAX: 0000000000010000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000282 RSI: 0000000000000098 RDI: 0000000000000050 RBP: ffff880028203cc0 R08: ffff88013e79c000 R09: ffff880028203ee0 R10: 0000000000000298 R11: 0000000000000282 R12: 0000000000000050 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000098 FS: 0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000050 CR3: 0000000001a85000 CR4: 00000000001407f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper (pid: 1, threadinfo ffff88013e79c000, task ffff88013e79b500) Stack: ffff880028203d00 ffffffff813de186 ffffffffffffff02 0000000000000000 <d> 0000000000000000 0000000000000000 0000000000000000 0000000000000098 <d> ffff880028203d70 ffffffff813e0162 ffff880028203d20 ffffffff8103b8ac Call Trace: <IRQ> [<ffffffff813de186>] serio_interrupt+0x36/0xa0 [<ffffffff813e0162>] i8042_interrupt+0x132/0x3a0 [<ffffffff8103b8ac>] ? kvm_clock_read+0x1c/0x20 [<ffffffff8103b8b9>] ? kvm_clock_get_cycles+0x9/0x10 [<ffffffff810e1640>] handle_IRQ_event+0x60/0x170 [<ffffffff8103b154>] ? kvm_guest_apic_eoi_write+0x44/0x50 [<ffffffff810e3d8e>] handle_edge_irq+0xde/0x180 [<ffffffff8100de89>] handle_irq+0x49/0xa0 [<ffffffff81516c8c>] do_IRQ+0x6c/0xf0 [<ffffffff8100b9d3>] ret_from_intr+0x0/0x11 [<ffffffff81076f63>] ? __do_softirq+0x73/0x1e0 [<ffffffff8109b75b>] ? hrtimer_interrupt+0x14b/0x260 [<ffffffff8100c1cc>] ? call_softirq+0x1c/0x30 [<ffffffff8100de05>] ? do_softirq+0x65/0xa0 [<ffffffff81076d95>] ? irq_exit+0x85/0x90 [<ffffffff81516d80>] ? smp_apic_timer_interrupt+0x70/0x9b [<ffffffff8100bb93>] ? apic_timer_interrupt+0x13/0x20 To avoid the issue let's change the second check to test whether serio is NULL or not. Also, let's take i8042_lock in i8042_start() and i8042_stop() instead of trying to be overly smart and using memory barriers. Signed-off-by: Chen Hong <[email protected]> [dtor: take lock in i8042_start()/i8042_stop()] Signed-off-by: Dmitry Torokhov <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit c8c16d3 upstream. Under heavy iser target(scst) start/stop stress during login/logout on iser intitiator side happened trace call provided below. The function iscsi_iser_slave_alloc iser_conn pointer could be NULL, due to the fact that function iscsi_iser_conn_stop can be called before and free iser connection. Let's protect that flow by introducing global mutex. BUG: unable to handle kernel paging request at 0000000000001018 IP: [<ffffffffc0426f7e>] iscsi_iser_slave_alloc+0x1e/0x50 [ib_iser] Call Trace: ? scsi_alloc_sdev+0x242/0x300 scsi_probe_and_add_lun+0x9e1/0xea0 ? kfree_const+0x21/0x30 ? kobject_set_name_vargs+0x76/0x90 ? __pm_runtime_resume+0x5b/0x70 __scsi_scan_target+0xf6/0x250 scsi_scan_target+0xea/0x100 iscsi_user_scan_session.part.13+0x101/0x130 [scsi_transport_iscsi] ? iscsi_user_scan_session.part.13+0x130/0x130 [scsi_transport_iscsi] iscsi_user_scan_session+0x1e/0x30 [scsi_transport_iscsi] device_for_each_child+0x50/0x90 iscsi_user_scan+0x44/0x60 [scsi_transport_iscsi] store_scan+0xa8/0x100 ? common_file_perm+0x5d/0x1c0 dev_attr_store+0x18/0x30 sysfs_kf_write+0x37/0x40 kernfs_fop_write+0x12c/0x1c0 __vfs_write+0x18/0x40 vfs_write+0xb5/0x1a0 SyS_write+0x55/0xc0 Fixes: 318d311 ("iser: Accept arbitrary sg lists mapping if the device supports it") Signed-off-by: Vladimir Neyelov <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Doug Ledford <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit bebb2a4 upstream. In function addr_resolve() the namespace is a required input parameter and not an output. It is passed later for searching the routing table and device addresses. Also, it shouldn't be copied back to the caller. Fixes: 565edd1 ('IB/addr: Pass network namespace as a parameter') Signed-off-by: Moni Shoua <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Doug Ledford <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 15a8b93 upstream. Otherwise, we enable a MAC forgery via timing attack. Signed-off-by: Jason A. Donenfeld <[email protected]> Cc: "J. Bruce Fields" <[email protected]> Cc: Jeff Layton <[email protected]> Cc: Trond Myklebust <[email protected]> Cc: Anna Schumaker <[email protected]> Cc: [email protected] Signed-off-by: Anna Schumaker <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit cc89684 upstream. Since commit bafc9b7 ("vfs: More precise tests in d_invalidate") in v3.18, a return of '0' from ->d_revalidate() will cause the dentry to be invalidated even if it has filesystems mounted on or it or on a descendant. The mounted filesystem is unmounted. This means we need to be careful not to return 0 unless the directory referred to truly is invalid. So -ESTALE or -ENOENT should invalidate the directory. Other errors such a -EPERM or -ERESTARTSYS should be returned from ->d_revalidate() so they are propagated to the caller. A particular problem can be demonstrated by: 1/ mount an NFS filesystem using NFSv3 on /mnt 2/ mount any other filesystem on /mnt/foo 3/ ls /mnt/foo 4/ turn off network, or otherwise make the server unable to respond 5/ ls /mnt/foo & 6/ cat /proc/$!/stack # note that nfs_lookup_revalidate is in the call stack 7/ kill -9 $! # this results in -ERESTARTSYS being returned 8/ observe that /mnt/foo has been unmounted. This patch changes nfs_lookup_revalidate() to only treat -ESTALE from nfs_lookup_verify_inode() and -ESTALE or -ENOENT from ->lookup() as indicating an invalid inode. Other errors are returned. Also nfs_check_inode_attributes() is changed to return -ESTALE rather than -EIO. This is consistent with the error returned in similar circumstances from nfs_update_inode(). As this bug allows any user to unmount a filesystem mounted on an NFS filesystem, this fix is suitable for stable kernels. Fixes: bafc9b7 ("vfs: More precise tests in d_invalidate") Signed-off-by: NeilBrown <[email protected]> Signed-off-by: Anna Schumaker <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit f2e9535 upstream. udf_setsize() called truncate_setsize() with i_data_sem held. Thus truncate_pagecache() called from truncate_setsize() could lock a page under i_data_sem which can deadlock as page lock ranks below i_data_sem - e. g. writeback can hold page lock and try to acquire i_data_sem to map a block. Fix the problem by moving truncate_setsize() calls from under i_data_sem. It is safe for us to change i_size without holding i_data_sem as all the places that depend on i_size being stable already hold inode_lock. Fixes: 7e49b6f Signed-off-by: Jan Kara <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 1d6ef27 upstream. This patch addresses a COMPARE_AND_WRITE se_device->caw_sem leak, that would be triggered during normal se_cmd shutdown or abort via __transport_wait_for_tasks(). This would occur because target_complete_cmd() would catch this early and do complete_all(&cmd->t_transport_stop_comp), but since target_complete_ok_work() or target_complete_failure_work() are never called to invoke se_cmd->transport_complete_callback(), the COMPARE_AND_WRITE specific callbacks never release caw_sem. To address this special case, go ahead and release caw_sem directly from target_complete_cmd(). (Remove '&& success' from check, to release caw_sem regardless of scsi_status - nab) Signed-off-by: Jiang Yi <[email protected]> Signed-off-by: Nicholas Bellinger <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

…done commit fce50a2 upstream. This patch fixes a NULL pointer dereference in isert_login_recv_done() of isert_conn->cm_id due to isert_cma_handler() -> isert_connect_error() resetting isert_conn->cm_id = NULL during a failed login attempt. As per Sagi, we will always see the completion of all recv wrs posted on the qp (given that we assigned a ->done handler), this is a FLUSH error completion, we just don't get to verify that because we deref NULL before. The issue here, was the assumption that dereferencing the connection cm_id is always safe, which is not true since: commit 4a579da Author: Sagi Grimberg <[email protected]> Date: Sun Mar 29 15:52:04 2015 +0300 iser-target: Fix possible deadlock in RDMA_CM connection error As I see it, we have a direct reference to the isert_device from isert_conn which is the one-liner fix that we actually need like we do in isert_rdma_read_done() and isert_rdma_write_done(). Reported-by: Andrea Righi <[email protected]> Tested-by: Andrea Righi <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Nicholas Bellinger <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

…o its target commit 80f6258 upstream. When the jump instruction is displayed at the row 0 in annotate view, the arrow is broken. An example: 16.86 │ ┌──je 82 0.01 │ movsd (%rsp),%xmm0 │ movsd 0x8(%rsp),%xmm4 │ movsd 0x8(%rsp),%xmm1 │ movsd (%rsp),%xmm3 │ divsd %xmm4,%xmm0 │ divsd %xmm3,%xmm1 │ movsd (%rsp),%xmm2 │ addsd %xmm1,%xmm0 │ addsd %xmm2,%xmm0 │ movsd %xmm0,(%rsp) │82: sub $0x1,%ebx 83.03 │ ↑ jne 38 │ add $0x10,%rsp │ xor %eax,%eax │ pop %rbx │ ← retq The patch increments the row number before checking with 0. Signed-off-by: Yao Jin <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Fixes: 944e1ab ("perf ui browser: Add method to draw up/down arrow line") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 6a8a75f upstream. This reverts commit cc1582c. This commit introduced a regression that broke rr-project, which uses sampling events to receive a signal on overflow (but does not care about the contents of the sample). These signals are critical to the correct operation of rr. There's been some back and forth about how to fix it - but to not keep applications in limbo queue up a revert. Reported-by: Kyle Huey <[email protected]> Acked-by: Kyle Huey <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Jin Yao <[email protected]> Cc: Vince Weaver <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Will Deacon <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/20170628105600.GC5981@leverpostej Signed-off-by: Ingo Molnar <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 5a1d4c5 upstream. Add support for USB Device TP-Link TL-WN722N v2. VendorID: 0x2357, ProductID: 0x010c Signed-off-by: Michael Gugino <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 15d5193 upstream. As reported by Éric Piel on the Comedi mailing list (see <https://groups.google.com/forum/#!topic/comedi_list/ueZiR7vTLOU/discussion>), the analog output asynchronous commands are running too fast with a period 50 ns shorter than it should be. This affects all boards with AO command support that are supported by the "ni_pcimio", "ni_atmio", and "ni_mio_cs" drivers. This is a regression bug introduced by commit 080e679 ("staging: comedi: ni_mio_common: Cleans up/clarifies ni_ao_cmd"), specifically, this line in `ni_ao_cmd_set_update()`: /* following line: N-1 per STC */ ni_stc_writel(dev, trigvar - 1, NISTC_AO_UI_LOADA_REG); The `trigvar` variable value comes from a call to `ni_ns_to_timer()` which converts a timer period in nanoseconds to a hardware divisor value. The function already reduces the divisor by 1 as required by the hardware, so the above line should not reduce it further by 1. Fix it by replacing `trigvar` by `trigvar - 1` in the above line, and remove the misleading comment. Reported-by: Éric Piel <[email protected]> Fixes: 080e679 ("staging: comedi: ni_mio_common: Cleans up/clarifies ni_ao_cmd") Cc: Éric Piel <[email protected]> Cc: Spencer E. Olson <[email protected]> Signed-off-by: Ian Abbott <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 740c433 upstream. If vesafb is enabled in the config then /dev/fb0 is created by vesa and this sm750 driver gets fb1, fb2. But we need to be fb0 and fb1 to effectively work with xorg. So if it has been alloted fb1, then try to remove the other fb0. In the previous send, why #ifdef is used was asked. https://lkml.org/lkml/2017/6/25/57 Answered at: https://lkml.org/lkml/2017/6/25/69 Also pasting here for reference. 'Did a quick research into "why". The patch d8801e4 ("x86/PCI: Set IORESOURCE_ROM_SHADOW only for the default VGA device") has started setting IORESOURCE_ROM_SHADOW in flags for a default VGA device and that is being done only for x86. And so, we will need that #ifdef to check IORESOURCE_ROM_SHADOW as that needs to be checked only for a x86 and not for other arch.' Signed-off-by: Teddy Wang <[email protected]> Signed-off-by: Sudip Mukherjee <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 566e1ce upstream. We now get a helpful warning for code that calls copy_{from,to}_iter without checking the return value, introduced by commit aa28de2 ("iov_iter/hardening: move object size checks to inlined part"). drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c: In function 'kiblnd_send': drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:1643:2: error: ignoring return value of 'copy_from_iter', declared with attribute warn_unused_result [-Werror=unused-result] drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c: In function 'kiblnd_recv': drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:1744:3: error: ignoring return value of 'copy_to_iter', declared with attribute warn_unused_result [-Werror=unused-result] In case we get short copies here, we may get incorrect behavior. I've added failure handling for both rx and tx now, returning -EFAULT as expected. Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: James Simmons <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 84583cf upstream. For a large directory, program needs to issue multiple readdir syscalls to get all dentries. When there are multiple programs read the directory concurrently. Following sequence of events can happen. - program calls readdir with pos = 2. ceph sends readdir request to mds. The reply contains N1 entries. ceph adds these N1 entries to readdir cache. - program calls readdir with pos = N1+2. The readdir is satisfied by the readdir cache, N2 entries are returned. (Other program calls readdir in the middle, which fills the cache) - program calls readdir with pos = N1+N2+2. ceph sends readdir request to mds. The reply contains N3 entries and it reaches directory end. ceph adds these N3 entries to the readdir cache and marks directory complete. The second readdir call does not update fi->readdir_cache_idx. ceph add the last N3 entries to wrong places. Signed-off-by: "Yan, Zheng" <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit a62ab66 upstream. Initialize the port_num for iWARP in rdma_init_qp_attr. Fixes: 5ecce4c("Check port number supplied by user verbs cmds") Reviewed-by: Steve Wise <[email protected]> Signed-off-by: Mustafa Ismail <[email protected]> Tested-by: Mike Marciniszyn <[email protected]> Signed-off-by: Doug Ledford <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 448421b upstream. Handle any error due to partial reads, timeouts etc. to avoid parsing uninitialized data subsequently. Also bail out if the parsing itself fails. Cc: Dave Airlie <[email protected]> Cc: Lyude <[email protected]> Cc: Daniel Vetter <[email protected]> Signed-off-by: Imre Deak <[email protected]> Reviewed-by: Lyude <[email protected]> Signed-off-by: Daniel Vetter <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 7f8b398 upstream. In case of an unknown broadcast message is sent mstb will remain unset, so check for this. Cc: Dave Airlie <[email protected]> Cc: Lyude <[email protected]> Cc: Daniel Vetter <[email protected]> Signed-off-by: Imre Deak <[email protected]> Reviewed-by: Lyude <[email protected]> Signed-off-by: Daniel Vetter <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>

…ions commit 636c4c3 upstream. Currently we may process up/down message transactions containing uninitialized data. This can happen if there was an error during the reception of any message in the transaction, but we happened to receive the last message correctly with the end-of-message flag set. To avoid this abort the reception of the transaction when the first error is detected, rejecting any messages until a message with the start-of-message flag is received (which will start a new transaction). This is also what the DP 1.4 spec 2.11.8.2 calls for in this case. In addtion this also prevents receiving bogus transactions without the first message with the the start-of-message flag set. v2: - unchanged v3: - git add the part that actually skips messages after an error in drm_dp_sideband_msg_build() Cc: Dave Airlie <[email protected]> Cc: Lyude <[email protected]> Cc: Daniel Vetter <[email protected]> Signed-off-by: Imre Deak <[email protected]> Reviewed-by: Lyude <[email protected]> Signed-off-by: Daniel Vetter <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 99975cd upstream. ib_map_mr_sg() can pass an SG-list to .map_mr_sg() that is larger than what fits into a single MR. .map_mr_sg() must not attempt to map more SG-list elements than what fits into a single MR. Hence make sure that mlx5_ib_sg_to_klms() does not write outside the MR klms[] array. Fixes: b005d31 ("mlx5: Add arbitrary sg list support") Signed-off-by: Bart Van Assche <[email protected]> Reviewed-by: Max Gurtovoy <[email protected]> Cc: Sagi Grimberg <[email protected]> Cc: Leon Romanovsky <[email protected]> Cc: Israel Rukshin <[email protected]> Acked-by: Leon Romanovsky <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Doug Ledford <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 8496946 upstream. When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit set, DIR1 is expected to have SGID bit set (and owning group equal to the owning group of 'DIR0'). However when 'DIR0' also has some default ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on 'DIR1' to get cleared if user is not member of the owning group. Fix the problem by creating __hfsplus_set_posix_acl() function that does not call posix_acl_update_mode() and use it when inheriting ACLs. That prevents SGID bit clearing and the mode has been properly set by posix_acl_create() anyway. Fixes: 0739310 Signed-off-by: Jan Kara <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 8fc646b upstream. On failure to prepare_creds(), mount fails with a random return value, as err was last set to an integer cast of a valid lower mnt pointer or set to 0 if inodes index feature is enabled. Reported-by: Dan Carpenter <[email protected]> Fixes: 3fe6e52 ("ovl: override creds with the ones from ...") Signed-off-by: Amir Goldstein <[email protected]> Signed-off-by: Miklos Szeredi <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 7e700d2 upstream. nfit_init() calls nfit_mce_register() on module load. When the module load fails the nfit mce decoder is not unregistered. The module's memory is freed leaving the decoder chain referencing junk. This will cause panics as future registrations will reference the free'd memory. Unregister the nfit mce decoder on module init failure. [v2]: register and then unregister mce handler to avoid losing mce events [v3]: also cleanup nfit workqueue Fixes: 6839a6d ("nfit: do an ARS scrub on hitting a latent media error") Cc: "Rafael J. Wysocki" <[email protected]> Cc: Len Brown <[email protected]> Cc: Vishal Verma <[email protected]> Cc: "Lee, Chun-Yi" <[email protected]> Cc: Linda Knippers <[email protected]> Cc: [email protected] Acked-by: Jeff Moyer <[email protected]> Signed-off-by: Prarit Bhargava <[email protected]> Reviewed-by: Vishal Verma <[email protected]> Signed-off-by: Dan Williams <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 7a3b7cd upstream. The ULPI bus can be built as a module, and it will soon be calling these functions when it supports probing devices from DT. Export them so they can be used by the ULPI module. Acked-by: Rob Herring <[email protected]> Cc: <[email protected]> Signed-off-by: Stephen Boyd <[email protected]> Signed-off-by: Peter Chen <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit d50daa2 upstream. Include the OF-based modalias in the uevent sent when registering SPMI devices, so that user space has a chance to autoload the kernel module for the device. Tested-by: Rob Clark <[email protected]> Reported-by: Rob Clark <[email protected]> Reviewed-by: Stephen Boyd <[email protected]> Signed-off-by: Bjorn Andersson <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 6883cd7 upstream. When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit set, DIR1 is expected to have SGID bit set (and owning group equal to the owning group of 'DIR0'). However when 'DIR0' also has some default ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on 'DIR1' to get cleared if user is not member of the owning group. Fix the problem by moving posix_acl_update_mode() out of __reiserfs_set_acl() into reiserfs_set_acl(). That way the function will not be called when inheriting ACLs which is what we want as it prevents SGID bit clearing and the mode has been properly set by posix_acl_create() anyway. Fixes: 0739310 CC: [email protected] Signed-off-by: Jan Kara <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

…eds if present commit 975e83c upstream. If the genpd->attach_dev or genpd->power_on fails, genpd_dev_pm_attach may return -EPROBE_DEFER initially. However genpd_alloc_dev_data sets the PM domain for the device unconditionally. When subsequent attempts are made to call genpd_dev_pm_attach, it may return -EEXISTS checking dev->pm_domain without re-attempting to call attach_dev or power_on. platform_drv_probe then attempts to call drv->probe as the return value -EEXIST != -EPROBE_DEFER, which may end up in a situation where the device is accessed without it's power domain switched on. Fixes: f104e1e (PM / Domains: Re-order initialization of generic_pm_domain_data) Signed-off-by: Sudeep Holla <[email protected]> Acked-by: Ulf Hansson <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit db9108e upstream. Hit the kmemleak when executing instance_rmdir, it forgot releasing mem of tracing_cpumask. With this fix, the warn does not appear any more. unreferenced object 0xffff93a8dfaa7c18 (size 8): comm "mkdir", pid 1436, jiffies 4294763622 (age 9134.308s) hex dump (first 8 bytes): ff ff ff ff ff ff ff ff ........ backtrace: [<ffffffff88b6567a>] kmemleak_alloc+0x4a/0xa0 [<ffffffff8861ea41>] __kmalloc_node+0xf1/0x280 [<ffffffff88b505d3>] alloc_cpumask_var_node+0x23/0x30 [<ffffffff88b5060e>] alloc_cpumask_var+0xe/0x10 [<ffffffff88571ab0>] instance_mkdir+0x90/0x240 [<ffffffff886e5100>] tracefs_syscall_mkdir+0x40/0x70 [<ffffffff886565c9>] vfs_mkdir+0x109/0x1b0 [<ffffffff8865b1d0>] SyS_mkdir+0xd0/0x100 [<ffffffff88403857>] do_syscall_64+0x67/0x150 [<ffffffff88b710e7>] return_from_SYSCALL_64+0x0/0x6a [<ffffffffffffffff>] 0xffffffffffffffff Link: http://lkml.kernel.org/r/[email protected] Fixes: ccfe9e4 ("tracing: Make tracing_cpumask available for all instances") Signed-off-by: Chunyu Hu <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

Commit ff86bf0 ("alarmtimer: Rate limit periodic intervals") sets a minimum bound on the alarm timer interval. This minimum bound shouldn't be applied if the interval is 0. Otherwise, one-shot timers will be converted into periodic ones. Fixes: ff86bf0 ("alarmtimer: Rate limit periodic intervals") Reported-by: Ben Fennema <[email protected]> Signed-off-by: Greg Hackmann <[email protected]> Cc: [email protected] Cc: John Stultz <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit cf3591e upstream. Revert the commit bd293d0. The proper fix has been made available with commit d0a255e ("loop: set PF_MEMALLOC_NOIO for the worker thread"). Note that the fix offered by commit bd293d0 doesn't really prevent the deadlock from occuring - if we look at the stacktrace reported by Junxiao Bi, we see that it hangs in bit_wait_io and not on the mutex - i.e. it has already successfully taken the mutex. Changing the mutex from mutex_lock to mutex_trylock won't help with deadlocks that happen afterwards. PID: 474 TASK: ffff8813e11f4600 CPU: 10 COMMAND: "kswapd0" #0 [ffff8813dedfb938] __schedule at ffffffff8173f405 Freescale#1 [ffff8813dedfb990] schedule at ffffffff8173fa27 Freescale#2 [ffff8813dedfb9b0] schedule_timeout at ffffffff81742fec Freescale#3 [ffff8813dedfba60] io_schedule_timeout at ffffffff8173f186 Freescale#4 [ffff8813dedfbaa0] bit_wait_io at ffffffff8174034f Freescale#5 [ffff8813dedfbac0] __wait_on_bit at ffffffff8173fec8 Freescale#6 [ffff8813dedfbb10] out_of_line_wait_on_bit at ffffffff8173ff81 Freescale#7 [ffff8813dedfbb90] __make_buffer_clean at ffffffffa038736f [dm_bufio] Freescale#8 [ffff8813dedfbbb0] __try_evict_buffer at ffffffffa0387bb8 [dm_bufio] Freescale#9 [ffff8813dedfbbd0] dm_bufio_shrink_scan at ffffffffa0387cc3 [dm_bufio] Freescale#10 [ffff8813dedfbc40] shrink_slab at ffffffff811a87ce Freescale#11 [ffff8813dedfbd30] shrink_zone at ffffffff811ad778 Freescale#12 [ffff8813dedfbdc0] kswapd at ffffffff811ae92f Freescale#13 [ffff8813dedfbec0] kthread at ffffffff810a8428 Freescale#14 [ffff8813dedfbf50] ret_from_fork at ffffffff81745242 Signed-off-by: Mikulas Patocka <[email protected]> Cc: [email protected] Fixes: bd293d0 ("dm bufio: fix deadlock with loop device") Depends-on: d0a255e ("loop: set PF_MEMALLOC_NOIO for the worker thread") Signed-off-by: Mike Snitzer <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

…ll_sock(). [ Upstream commit cb66ddd ] When it is to cleanup net namespace, rds_tcp_exit_net() will call rds_tcp_kill_sock(), if t_sock is NULL, it will not call rds_conn_destroy(), rds_conn_path_destroy() and rds_tcp_conn_free() to free connection, and the worker cp_conn_w is not stopped, afterwards the net is freed in net_drop_ns(); While cp_conn_w rds_connect_worker() will call rds_tcp_conn_path_connect() and reference 'net' which has already been freed. In rds_tcp_conn_path_connect(), rds_tcp_set_callbacks() will set t_sock = sock before sock->ops->connect, but if connect() is failed, it will call rds_tcp_restore_callbacks() and set t_sock = NULL, if connect is always failed, rds_connect_worker() will try to reconnect all the time, so rds_tcp_kill_sock() will never to cancel worker cp_conn_w and free the connections. Therefore, the condition !tc->t_sock is not needed if it is going to do cleanup_net->rds_tcp_exit_net->rds_tcp_kill_sock, because tc->t_sock is always NULL, and there is on other path to cancel cp_conn_w and free connection. So this patch is to fix this. rds_tcp_kill_sock(): ... if (net != c_net || !tc->t_sock) ... Acked-by: Santosh Shilimkar <[email protected]> ================================================================== BUG: KASAN: use-after-free in inet_create+0xbcc/0xd28 net/ipv4/af_inet.c:340 Read of size 4 at addr ffff8003496a4684 by task kworker/u8:4/3721 CPU: 3 PID: 3721 Comm: kworker/u8:4 Not tainted 5.1.0 Freescale#11 Hardware name: linux,dummy-virt (DT) Workqueue: krdsd rds_connect_worker Call trace: dump_backtrace+0x0/0x3c0 arch/arm64/kernel/time.c:53 show_stack+0x28/0x38 arch/arm64/kernel/traps.c:152 __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x120/0x188 lib/dump_stack.c:113 print_address_description+0x68/0x278 mm/kasan/report.c:253 kasan_report_error mm/kasan/report.c:351 [inline] kasan_report+0x21c/0x348 mm/kasan/report.c:409 __asan_report_load4_noabort+0x30/0x40 mm/kasan/report.c:429 inet_create+0xbcc/0xd28 net/ipv4/af_inet.c:340 __sock_create+0x4f8/0x770 net/socket.c:1276 sock_create_kern+0x50/0x68 net/socket.c:1322 rds_tcp_conn_path_connect+0x2b4/0x690 net/rds/tcp_connect.c:114 rds_connect_worker+0x108/0x1d0 net/rds/threads.c:175 process_one_work+0x6e8/0x1700 kernel/workqueue.c:2153 worker_thread+0x3b0/0xdd0 kernel/workqueue.c:2296 kthread+0x2f0/0x378 kernel/kthread.c:255 ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1117 Allocated by task 687: save_stack mm/kasan/kasan.c:448 [inline] set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc+0xd4/0x180 mm/kasan/kasan.c:553 kasan_slab_alloc+0x14/0x20 mm/kasan/kasan.c:490 slab_post_alloc_hook mm/slab.h:444 [inline] slab_alloc_node mm/slub.c:2705 [inline] slab_alloc mm/slub.c:2713 [inline] kmem_cache_alloc+0x14c/0x388 mm/slub.c:2718 kmem_cache_zalloc include/linux/slab.h:697 [inline] net_alloc net/core/net_namespace.c:384 [inline] copy_net_ns+0xc4/0x2d0 net/core/net_namespace.c:424 create_new_namespaces+0x300/0x658 kernel/nsproxy.c:107 unshare_nsproxy_namespaces+0xa0/0x198 kernel/nsproxy.c:206 ksys_unshare+0x340/0x628 kernel/fork.c:2577 __do_sys_unshare kernel/fork.c:2645 [inline] __se_sys_unshare kernel/fork.c:2643 [inline] __arm64_sys_unshare+0x38/0x58 kernel/fork.c:2643 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline] invoke_syscall arch/arm64/kernel/syscall.c:47 [inline] el0_svc_common+0x168/0x390 arch/arm64/kernel/syscall.c:83 el0_svc_handler+0x60/0xd0 arch/arm64/kernel/syscall.c:129 el0_svc+0x8/0xc arch/arm64/kernel/entry.S:960 Freed by task 264: save_stack mm/kasan/kasan.c:448 [inline] set_track mm/kasan/kasan.c:460 [inline] __kasan_slab_free+0x114/0x220 mm/kasan/kasan.c:521 kasan_slab_free+0x10/0x18 mm/kasan/kasan.c:528 slab_free_hook mm/slub.c:1370 [inline] slab_free_freelist_hook mm/slub.c:1397 [inline] slab_free mm/slub.c:2952 [inline] kmem_cache_free+0xb8/0x3a8 mm/slub.c:2968 net_free net/core/net_namespace.c:400 [inline] net_drop_ns.part.6+0x78/0x90 net/core/net_namespace.c:407 net_drop_ns net/core/net_namespace.c:406 [inline] cleanup_net+0x53c/0x6d8 net/core/net_namespace.c:569 process_one_work+0x6e8/0x1700 kernel/workqueue.c:2153 worker_thread+0x3b0/0xdd0 kernel/workqueue.c:2296 kthread+0x2f0/0x378 kernel/kthread.c:255 ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1117 The buggy address belongs to the object at ffff8003496a3f80 which belongs to the cache net_namespace of size 7872 The buggy address is located 1796 bytes inside of 7872-byte region [ffff8003496a3f80, ffff8003496a5e40) The buggy address belongs to the page: page:ffff7e000d25a800 count:1 mapcount:0 mapping:ffff80036ce4b000 index:0x0 compound_mapcount: 0 flags: 0xffffe0000008100(slab|head) raw: 0ffffe0000008100 dead000000000100 dead000000000200 ffff80036ce4b000 raw: 0000000000000000 0000000080040004 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8003496a4580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8003496a4600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff8003496a4680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8003496a4700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8003496a4780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Fixes: 467fa15("RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.") Reported-by: Hulk Robot <[email protected]> Signed-off-by: Mao Wenan <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> (cherry picked from commit c8a8879)

commit 5480e29 upstream. Some time ago the block layer was modified such that timeout handlers are called from thread context instead of interrupt context. Make it safe to run the iSCSI timeout handler in thread context. This patch fixes the following lockdep complaint: ================================ WARNING: inconsistent lock state 5.5.1-dbg+ Freescale#11 Not tainted -------------------------------- inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. kworker/7:1H/206 [HC0[0]:SC0[0]:HE1:SE1] takes: ffff88802d9827e8 (&(&session->frwd_lock)->rlock){+.?.}, at: iscsi_eh_cmd_timed_out+0xa6/0x6d0 [libiscsi] {IN-SOFTIRQ-W} state was registered at: lock_acquire+0x106/0x240 _raw_spin_lock+0x38/0x50 iscsi_check_transport_timeouts+0x3e/0x210 [libiscsi] call_timer_fn+0x132/0x470 __run_timers.part.0+0x39f/0x5b0 run_timer_softirq+0x63/0xc0 __do_softirq+0x12d/0x5fd irq_exit+0xb3/0x110 smp_apic_timer_interrupt+0x131/0x3d0 apic_timer_interrupt+0xf/0x20 default_idle+0x31/0x230 arch_cpu_idle+0x13/0x20 default_idle_call+0x53/0x60 do_idle+0x38a/0x3f0 cpu_startup_entry+0x24/0x30 start_secondary+0x222/0x290 secondary_startup_64+0xa4/0xb0 irq event stamp: 1383705 hardirqs last enabled at (1383705): [<ffffffff81aace5c>] _raw_spin_unlock_irq+0x2c/0x50 hardirqs last disabled at (1383704): [<ffffffff81aacb98>] _raw_spin_lock_irq+0x18/0x50 softirqs last enabled at (1383690): [<ffffffffa0e2efea>] iscsi_queuecommand+0x76a/0xa20 [libiscsi] softirqs last disabled at (1383682): [<ffffffffa0e2e998>] iscsi_queuecommand+0x118/0xa20 [libiscsi] other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&session->frwd_lock)->rlock); <Interrupt> lock(&(&session->frwd_lock)->rlock); *** DEADLOCK *** 2 locks held by kworker/7:1H/206: #0: ffff8880d57bf928 ((wq_completion)kblockd){+.+.}, at: process_one_work+0x472/0xab0 Freescale#1: ffff88802b9c7de8 ((work_completion)(&q->timeout_work)){+.+.}, at: process_one_work+0x476/0xab0 stack backtrace: CPU: 7 PID: 206 Comm: kworker/7:1H Not tainted 5.5.1-dbg+ Freescale#11 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Workqueue: kblockd blk_mq_timeout_work Call Trace: dump_stack+0xa5/0xe6 print_usage_bug.cold+0x232/0x23b mark_lock+0x8dc/0xa70 __lock_acquire+0xcea/0x2af0 lock_acquire+0x106/0x240 _raw_spin_lock+0x38/0x50 iscsi_eh_cmd_timed_out+0xa6/0x6d0 [libiscsi] scsi_times_out+0xf4/0x440 [scsi_mod] scsi_timeout+0x1d/0x20 [scsi_mod] blk_mq_check_expired+0x365/0x3a0 bt_iter+0xd6/0xf0 blk_mq_queue_tag_busy_iter+0x3de/0x650 blk_mq_timeout_work+0x1af/0x380 process_one_work+0x56d/0xab0 worker_thread+0x7a/0x5d0 kthread+0x1bc/0x210 ret_from_fork+0x24/0x30 Fixes: 287922e ("block: defer timeouts to a workqueue") Cc: Christoph Hellwig <[email protected]> Cc: Keith Busch <[email protected]> Cc: Lee Duncan <[email protected]> Cc: Chris Leech <[email protected]> Cc: <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Bart Van Assche <[email protected]> Reviewed-by: Lee Duncan <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

…ll_sock(). [ Upstream commit cb66ddd ] When it is to cleanup net namespace, rds_tcp_exit_net() will call rds_tcp_kill_sock(), if t_sock is NULL, it will not call rds_conn_destroy(), rds_conn_path_destroy() and rds_tcp_conn_free() to free connection, and the worker cp_conn_w is not stopped, afterwards the net is freed in net_drop_ns(); While cp_conn_w rds_connect_worker() will call rds_tcp_conn_path_connect() and reference 'net' which has already been freed. In rds_tcp_conn_path_connect(), rds_tcp_set_callbacks() will set t_sock = sock before sock->ops->connect, but if connect() is failed, it will call rds_tcp_restore_callbacks() and set t_sock = NULL, if connect is always failed, rds_connect_worker() will try to reconnect all the time, so rds_tcp_kill_sock() will never to cancel worker cp_conn_w and free the connections. Therefore, the condition !tc->t_sock is not needed if it is going to do cleanup_net->rds_tcp_exit_net->rds_tcp_kill_sock, because tc->t_sock is always NULL, and there is on other path to cancel cp_conn_w and free connection. So this patch is to fix this. rds_tcp_kill_sock(): ... if (net != c_net || !tc->t_sock) ... Acked-by: Santosh Shilimkar <[email protected]> ================================================================== BUG: KASAN: use-after-free in inet_create+0xbcc/0xd28 net/ipv4/af_inet.c:340 Read of size 4 at addr ffff8003496a4684 by task kworker/u8:4/3721 CPU: 3 PID: 3721 Comm: kworker/u8:4 Not tainted 5.1.0 Freescale#11 Hardware name: linux,dummy-virt (DT) Workqueue: krdsd rds_connect_worker Call trace: dump_backtrace+0x0/0x3c0 arch/arm64/kernel/time.c:53 show_stack+0x28/0x38 arch/arm64/kernel/traps.c:152 __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x120/0x188 lib/dump_stack.c:113 print_address_description+0x68/0x278 mm/kasan/report.c:253 kasan_report_error mm/kasan/report.c:351 [inline] kasan_report+0x21c/0x348 mm/kasan/report.c:409 __asan_report_load4_noabort+0x30/0x40 mm/kasan/report.c:429 inet_create+0xbcc/0xd28 net/ipv4/af_inet.c:340 __sock_create+0x4f8/0x770 net/socket.c:1276 sock_create_kern+0x50/0x68 net/socket.c:1322 rds_tcp_conn_path_connect+0x2b4/0x690 net/rds/tcp_connect.c:114 rds_connect_worker+0x108/0x1d0 net/rds/threads.c:175 process_one_work+0x6e8/0x1700 kernel/workqueue.c:2153 worker_thread+0x3b0/0xdd0 kernel/workqueue.c:2296 kthread+0x2f0/0x378 kernel/kthread.c:255 ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1117 Allocated by task 687: save_stack mm/kasan/kasan.c:448 [inline] set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc+0xd4/0x180 mm/kasan/kasan.c:553 kasan_slab_alloc+0x14/0x20 mm/kasan/kasan.c:490 slab_post_alloc_hook mm/slab.h:444 [inline] slab_alloc_node mm/slub.c:2705 [inline] slab_alloc mm/slub.c:2713 [inline] kmem_cache_alloc+0x14c/0x388 mm/slub.c:2718 kmem_cache_zalloc include/linux/slab.h:697 [inline] net_alloc net/core/net_namespace.c:384 [inline] copy_net_ns+0xc4/0x2d0 net/core/net_namespace.c:424 create_new_namespaces+0x300/0x658 kernel/nsproxy.c:107 unshare_nsproxy_namespaces+0xa0/0x198 kernel/nsproxy.c:206 ksys_unshare+0x340/0x628 kernel/fork.c:2577 __do_sys_unshare kernel/fork.c:2645 [inline] __se_sys_unshare kernel/fork.c:2643 [inline] __arm64_sys_unshare+0x38/0x58 kernel/fork.c:2643 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline] invoke_syscall arch/arm64/kernel/syscall.c:47 [inline] el0_svc_common+0x168/0x390 arch/arm64/kernel/syscall.c:83 el0_svc_handler+0x60/0xd0 arch/arm64/kernel/syscall.c:129 el0_svc+0x8/0xc arch/arm64/kernel/entry.S:960 Freed by task 264: save_stack mm/kasan/kasan.c:448 [inline] set_track mm/kasan/kasan.c:460 [inline] __kasan_slab_free+0x114/0x220 mm/kasan/kasan.c:521 kasan_slab_free+0x10/0x18 mm/kasan/kasan.c:528 slab_free_hook mm/slub.c:1370 [inline] slab_free_freelist_hook mm/slub.c:1397 [inline] slab_free mm/slub.c:2952 [inline] kmem_cache_free+0xb8/0x3a8 mm/slub.c:2968 net_free net/core/net_namespace.c:400 [inline] net_drop_ns.part.6+0x78/0x90 net/core/net_namespace.c:407 net_drop_ns net/core/net_namespace.c:406 [inline] cleanup_net+0x53c/0x6d8 net/core/net_namespace.c:569 process_one_work+0x6e8/0x1700 kernel/workqueue.c:2153 worker_thread+0x3b0/0xdd0 kernel/workqueue.c:2296 kthread+0x2f0/0x378 kernel/kthread.c:255 ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1117 The buggy address belongs to the object at ffff8003496a3f80 which belongs to the cache net_namespace of size 7872 The buggy address is located 1796 bytes inside of 7872-byte region [ffff8003496a3f80, ffff8003496a5e40) The buggy address belongs to the page: page:ffff7e000d25a800 count:1 mapcount:0 mapping:ffff80036ce4b000 index:0x0 compound_mapcount: 0 flags: 0xffffe0000008100(slab|head) raw: 0ffffe0000008100 dead000000000100 dead000000000200 ffff80036ce4b000 raw: 0000000000000000 0000000080040004 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8003496a4580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8003496a4600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff8003496a4680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8003496a4700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8003496a4780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Fixes: 467fa15("RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.") Reported-by: Hulk Robot <[email protected]> Signed-off-by: Mao Wenan <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

[ Upstream commit 42dfa45 ] Using gcc's ASan, Changbin reports: ================================================================= ==7494==ERROR: LeakSanitizer: detected memory leaks Direct leak of 48 byte(s) in 1 object(s) allocated from: #0 0x7f0333a89138 in calloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee138) Freescale#1 0x5625e5330a5e in zalloc util/util.h:23 Freescale#2 0x5625e5330a9b in perf_counts__new util/counts.c:10 Freescale#3 0x5625e5330ca0 in perf_evsel__alloc_counts util/counts.c:47 Freescale#4 0x5625e520d8e5 in __perf_evsel__read_on_cpu util/evsel.c:1505 Freescale#5 0x5625e517a985 in perf_evsel__read_on_cpu /home/work/linux/tools/perf/util/evsel.h:347 Freescale#6 0x5625e517ad1a in test__openat_syscall_event tests/openat-syscall.c:47 Freescale#7 0x5625e51528e6 in run_test tests/builtin-test.c:358 Freescale#8 0x5625e5152baf in test_and_print tests/builtin-test.c:388 Freescale#9 0x5625e51543fe in __cmd_test tests/builtin-test.c:583 Freescale#10 0x5625e515572f in cmd_test tests/builtin-test.c:722 Freescale#11 0x5625e51c3fb8 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302 Freescale#12 0x5625e51c44f7 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354 Freescale#13 0x5625e51c48fb in run_argv /home/changbin/work/linux/tools/perf/perf.c:398 Freescale#14 0x5625e51c5069 in main /home/changbin/work/linux/tools/perf/perf.c:520 Freescale#15 0x7f033214d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) Indirect leak of 72 byte(s) in 1 object(s) allocated from: #0 0x7f0333a89138 in calloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee138) Freescale#1 0x5625e532560d in zalloc util/util.h:23 Freescale#2 0x5625e532566b in xyarray__new util/xyarray.c:10 Freescale#3 0x5625e5330aba in perf_counts__new util/counts.c:15 Freescale#4 0x5625e5330ca0 in perf_evsel__alloc_counts util/counts.c:47 Freescale#5 0x5625e520d8e5 in __perf_evsel__read_on_cpu util/evsel.c:1505 Freescale#6 0x5625e517a985 in perf_evsel__read_on_cpu /home/work/linux/tools/perf/util/evsel.h:347 Freescale#7 0x5625e517ad1a in test__openat_syscall_event tests/openat-syscall.c:47 Freescale#8 0x5625e51528e6 in run_test tests/builtin-test.c:358 Freescale#9 0x5625e5152baf in test_and_print tests/builtin-test.c:388 Freescale#10 0x5625e51543fe in __cmd_test tests/builtin-test.c:583 Freescale#11 0x5625e515572f in cmd_test tests/builtin-test.c:722 Freescale#12 0x5625e51c3fb8 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302 Freescale#13 0x5625e51c44f7 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354 Freescale#14 0x5625e51c48fb in run_argv /home/changbin/work/linux/tools/perf/perf.c:398 Freescale#15 0x5625e51c5069 in main /home/changbin/work/linux/tools/perf/perf.c:520 Freescale#16 0x7f033214d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) His patch took care of evsel->prev_raw_counts, but the above backtraces are about evsel->counts, so fix that instead. Reported-by: Changbin Du <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Link: https://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

…_event_on_all_cpus test [ Upstream commit 93faa52 ] ================================================================= ==7497==ERROR: LeakSanitizer: detected memory leaks Direct leak of 40 byte(s) in 1 object(s) allocated from: #0 0x7f0333a88f30 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xedf30) Freescale#1 0x5625e5326213 in cpu_map__trim_new util/cpumap.c:45 Freescale#2 0x5625e5326703 in cpu_map__read util/cpumap.c:103 Freescale#3 0x5625e53267ef in cpu_map__read_all_cpu_map util/cpumap.c:120 Freescale#4 0x5625e5326915 in cpu_map__new util/cpumap.c:135 Freescale#5 0x5625e517b355 in test__openat_syscall_event_on_all_cpus tests/openat-syscall-all-cpus.c:36 Freescale#6 0x5625e51528e6 in run_test tests/builtin-test.c:358 Freescale#7 0x5625e5152baf in test_and_print tests/builtin-test.c:388 Freescale#8 0x5625e51543fe in __cmd_test tests/builtin-test.c:583 Freescale#9 0x5625e515572f in cmd_test tests/builtin-test.c:722 Freescale#10 0x5625e51c3fb8 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302 Freescale#11 0x5625e51c44f7 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354 Freescale#12 0x5625e51c48fb in run_argv /home/changbin/work/linux/tools/perf/perf.c:398 Freescale#13 0x5625e51c5069 in main /home/changbin/work/linux/tools/perf/perf.c:520 Freescale#14 0x7f033214d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) Signed-off-by: Changbin Du <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Fixes: f30a79b ("perf tools: Add reference counting for cpu_map object") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

[ Upstream commit d982b33 ] ================================================================= ==20875==ERROR: LeakSanitizer: detected memory leaks Direct leak of 1160 byte(s) in 1 object(s) allocated from: #0 0x7f1b6fc84138 in calloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee138) Freescale#1 0x55bd50005599 in zalloc util/util.h:23 Freescale#2 0x55bd500068f5 in perf_evsel__newtp_idx util/evsel.c:327 Freescale#3 0x55bd4ff810fc in perf_evsel__newtp /home/work/linux/tools/perf/util/evsel.h:216 Freescale#4 0x55bd4ff81608 in test__perf_evsel__tp_sched_test tests/evsel-tp-sched.c:69 Freescale#5 0x55bd4ff528e6 in run_test tests/builtin-test.c:358 Freescale#6 0x55bd4ff52baf in test_and_print tests/builtin-test.c:388 Freescale#7 0x55bd4ff543fe in __cmd_test tests/builtin-test.c:583 Freescale#8 0x55bd4ff5572f in cmd_test tests/builtin-test.c:722 Freescale#9 0x55bd4ffc4087 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302 Freescale#10 0x55bd4ffc45c6 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354 Freescale#11 0x55bd4ffc49ca in run_argv /home/changbin/work/linux/tools/perf/perf.c:398 Freescale#12 0x55bd4ffc5138 in main /home/changbin/work/linux/tools/perf/perf.c:520 Freescale#13 0x7f1b6e34809a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) Indirect leak of 19 byte(s) in 1 object(s) allocated from: #0 0x7f1b6fc83f30 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xedf30) Freescale#1 0x7f1b6e3ac30f in vasprintf (/lib/x86_64-linux-gnu/libc.so.6+0x8830f) Signed-off-by: Changbin Du <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Fixes: 6a6cd11 ("perf test: Add test for the sched tracepoint format fields") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

[ Upstream commit 5bebf74 ] Commit 83ff931 ("bcache: not use hard coded memset size in bch_cache_accounting_clear()") tries to make the code more easy to understand by removing the hard coded number with following change, void bch_cache_accounting_clear(...) { memset(&acc->total.cache_hits, 0, - sizeof(unsigned long) * 7); + sizeof(struct cache_stats)); } Unfortunately the change was wrong (it also tells us the original code was not easy to correctly understand). The hard coded number 7 is used because in struct cache_stats, 15 struct cache_stats { 16 struct kobject kobj; 17 18 unsigned long cache_hits; 19 unsigned long cache_misses; 20 unsigned long cache_bypass_hits; 21 unsigned long cache_bypass_misses; 22 unsigned long cache_readaheads; 23 unsigned long cache_miss_collisions; 24 unsigned long sectors_bypassed; 25 26 unsigned int rescale; 27 }; only members in LINE 18-24 want to be set to 0. It is wrong to use 'sizeof(struct cache_stats)' to replace 'sizeof(unsigned long) * 7), the memory objects behind acc->total is staled by this change. Сорокин Артем Сергеевич reports that by the following steps, kernel panic will be triggered, 1. Create new set: make-bcache -B /dev/nvme1n1 -C /dev/sda --wipe-bcache 2. Run in /sys/fs/bcache/<uuid>: echo 1 > clear_stats && cat stats_five_minute/cache_bypass_hits I can reproduce the panic and get following dmesg with KASAN enabled, [22613.172742] ================================================================== [22613.172862] BUG: KASAN: null-ptr-deref in sysfs_kf_seq_show+0x117/0x230 [22613.172864] Read of size 8 at addr 0000000000000000 by task cat/6753 [22613.172870] CPU: 1 PID: 6753 Comm: cat Not tainted 5.5.0-rc7-lp151.28.16-default+ #11 [22613.172872] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/29/2019 [22613.172873] Call Trace: [22613.172964] dump_stack+0x8b/0xbb [22613.172968] ? sysfs_kf_seq_show+0x117/0x230 [22613.172970] ? sysfs_kf_seq_show+0x117/0x230 [22613.173031] __kasan_report+0x176/0x192 [22613.173064] ? pr_cont_kernfs_name+0x40/0x60 [22613.173067] ? sysfs_kf_seq_show+0x117/0x230 [22613.173070] kasan_report+0xe/0x20 [22613.173072] sysfs_kf_seq_show+0x117/0x230 [22613.173105] seq_read+0x199/0x6d0 [22613.173110] vfs_read+0xa5/0x1a0 [22613.173113] ksys_read+0x110/0x160 [22613.173115] ? kernel_write+0xb0/0xb0 [22613.173177] do_syscall_64+0x77/0x290 [22613.173238] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [22613.173241] RIP: 0033:0x7fc2c886ac61 [22613.173244] Code: fe ff ff 48 8d 3d c7 a0 09 00 48 83 ec 08 e8 46 03 02 00 66 0f 1f 44 00 00 8b 05 ca fb 2c 00 48 63 ff 85 c0 75 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 57 f3 c3 0f 1f 44 00 00 55 53 48 89 d5 48 89 [22613.173245] RSP: 002b:00007ffebe776d68 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [22613.173248] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007fc2c886ac61 [22613.173249] RDX: 0000000000020000 RSI: 00007fc2c8cca000 RDI: 0000000000000003 [22613.173250] RBP: 0000000000020000 R08: ffffffffffffffff R09: 0000000000000000 [22613.173251] R10: 000000000000038c R11: 0000000000000246 R12: 00007fc2c8cca000 [22613.173253] R13: 0000000000000003 R14: 00007fc2c8cca00f R15: 0000000000020000 [22613.173255] ================================================================== [22613.173256] Disabling lock debugging due to kernel taint [22613.173350] BUG: kernel NULL pointer dereference, address: 0000000000000000 [22613.178380] #PF: supervisor read access in kernel mode [22613.180959] #PF: error_code(0x0000) - not-present page [22613.183444] PGD 0 P4D 0 [22613.184867] Oops: 0000 [#1] SMP KASAN PTI [22613.186797] CPU: 1 PID: 6753 Comm: cat Tainted: G B 5.5.0-rc7-lp151.28.16-default+ #11 [22613.191253] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/29/2019 [22613.196706] RIP: 0010:sysfs_kf_seq_show+0x117/0x230 [22613.199097] Code: ff 48 8b 0b 48 8b 44 24 08 48 01 e9 eb a6 31 f6 48 89 cf ba 00 10 00 00 48 89 4c 24 10 e8 b1 e6 e9 ff 4c 89 ff e8 19 07 ea ff <49> 8b 07 48 85 c0 48 89 44 24 08 0f 84 91 00 00 00 49 8b 6d 00 48 [22613.208016] RSP: 0018:ffff8881d4f8fd78 EFLAGS: 00010246 [22613.210448] RAX: 0000000000000000 RBX: ffff8881eb99b180 RCX: ffffffff810d9ef6 [22613.213691] RDX: 0000000000000001 RSI: 0000000000000246 RDI: 0000000000000246 [22613.216893] RBP: 0000000000001000 R08: fffffbfff072ddcd R09: fffffbfff072ddcd [22613.220075] R10: 0000000000000001 R11: fffffbfff072ddcc R12: ffff8881de5c0200 [22613.223256] R13: ffff8881ed175500 R14: ffff8881eb99b198 R15: 0000000000000000 [22613.226290] FS: 00007fc2c8d3d500(0000) GS:ffff8881f2a80000(0000) knlGS:0000000000000000 [22613.229637] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [22613.231993] CR2: 0000000000000000 CR3: 00000001ec89a004 CR4: 00000000003606e0 [22613.234909] Call Trace: [22613.235931] seq_read+0x199/0x6d0 [22613.237259] vfs_read+0xa5/0x1a0 [22613.239229] ksys_read+0x110/0x160 [22613.240590] ? kernel_write+0xb0/0xb0 [22613.242040] do_syscall_64+0x77/0x290 [22613.243625] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [22613.245450] RIP: 0033:0x7fc2c886ac61 [22613.246706] Code: fe ff ff 48 8d 3d c7 a0 09 00 48 83 ec 08 e8 46 03 02 00 66 0f 1f 44 00 00 8b 05 ca fb 2c 00 48 63 ff 85 c0 75 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 57 f3 c3 0f 1f 44 00 00 55 53 48 89 d5 48 89 [22613.253296] RSP: 002b:00007ffebe776d68 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [22613.255835] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007fc2c886ac61 [22613.258472] RDX: 0000000000020000 RSI: 00007fc2c8cca000 RDI: 0000000000000003 [22613.260807] RBP: 0000000000020000 R08: ffffffffffffffff R09: 0000000000000000 [22613.263188] R10: 000000000000038c R11: 0000000000000246 R12: 00007fc2c8cca000 [22613.265598] R13: 0000000000000003 R14: 00007fc2c8cca00f R15: 0000000000020000 [22613.268729] Modules linked in: scsi_transport_iscsi af_packet iscsi_ibft iscsi_boot_sysfs vmw_vsock_vmci_transport vsock fuse bnep kvm_intel kvm irqbypass crc32_pclmul crc32c_intel ghash_clmulni_intel snd_ens1371 snd_ac97_codec ac97_bus bcache snd_pcm btusb btrtl btbcm btintel crc64 aesni_intel glue_helper crypto_simd vmw_balloon cryptd bluetooth snd_timer snd_rawmidi snd joydev pcspkr e1000 rfkill vmw_vmci soundcore ecdh_generic ecc gameport i2c_piix4 mptctl ac button hid_generic usbhid sr_mod cdrom ata_generic ehci_pci vmwgfx uhci_hcd drm_kms_helper syscopyarea serio_raw sysfillrect sysimgblt fb_sys_fops ttm ehci_hcd mptspi scsi_transport_spi mptscsih ata_piix mptbase ahci usbcore libahci drm sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua [22613.292429] CR2: 0000000000000000 [22613.293563] ---[ end trace a074b26a8508f378 ]--- [22613.295138] RIP: 0010:sysfs_kf_seq_show+0x117/0x230 [22613.296769] Code: ff 48 8b 0b 48 8b 44 24 08 48 01 e9 eb a6 31 f6 48 89 cf ba 00 10 00 00 48 89 4c 24 10 e8 b1 e6 e9 ff 4c 89 ff e8 19 07 ea ff <49> 8b 07 48 85 c0 48 89 44 24 08 0f 84 91 00 00 00 49 8b 6d 00 48 [22613.303553] RSP: 0018:ffff8881d4f8fd78 EFLAGS: 00010246 [22613.305280] RAX: 0000000000000000 RBX: ffff8881eb99b180 RCX: ffffffff810d9ef6 [22613.307924] RDX: 0000000000000001 RSI: 0000000000000246 RDI: 0000000000000246 [22613.310272] RBP: 0000000000001000 R08: fffffbfff072ddcd R09: fffffbfff072ddcd [22613.312685] R10: 0000000000000001 R11: fffffbfff072ddcc R12: ffff8881de5c0200 [22613.315076] R13: ffff8881ed175500 R14: ffff8881eb99b198 R15: 0000000000000000 [22613.318116] FS: 00007fc2c8d3d500(0000) GS:ffff8881f2a80000(0000) knlGS:0000000000000000 [22613.320743] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [22613.322628] CR2: 0000000000000000 CR3: 00000001ec89a004 CR4: 00000000003606e0 Here this patch fixes the following problem by explicity set all the 7 members to 0 in bch_cache_accounting_clear(). Reported-by: Сорокин Артем Сергеевич <[email protected]> Signed-off-by: Coly Li <[email protected]> Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

commit ca4463b upstream. The VT_DISALLOCATE ioctl can free a virtual console while tty_release() is still running, causing a use-after-free in con_shutdown(). This occurs because VT_DISALLOCATE considers a virtual console's 'struct vc_data' to be unused as soon as the corresponding tty's refcount hits 0. But actually it may be still being closed. Fix this by making vc_data be reference-counted via the embedded 'struct tty_port'. A newly allocated virtual console has refcount 1. Opening it for the first time increments the refcount to 2. Closing it for the last time decrements the refcount (in tty_operations::cleanup() so that it happens late enough), as does VT_DISALLOCATE. Reproducer: #include <fcntl.h> #include <linux/vt.h> #include <sys/ioctl.h> #include <unistd.h> int main() { if (fork()) { for (;;) close(open("/dev/tty5", O_RDWR)); } else { int fd = open("/dev/tty10", O_RDWR); for (;;) ioctl(fd, VT_DISALLOCATE, 5); } } KASAN report: BUG: KASAN: use-after-free in con_shutdown+0x76/0x80 drivers/tty/vt/vt.c:3278 Write of size 8 at addr ffff88806a4ec108 by task syz_vt/129 CPU: 0 PID: 129 Comm: syz_vt Not tainted 5.6.0-rc2 #11 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20191223_100556-anatol 04/01/2014 Call Trace: [...] con_shutdown+0x76/0x80 drivers/tty/vt/vt.c:3278 release_tty+0xa8/0x410 drivers/tty/tty_io.c:1514 tty_release_struct+0x34/0x50 drivers/tty/tty_io.c:1629 tty_release+0x984/0xed0 drivers/tty/tty_io.c:1789 [...] Allocated by task 129: [...] kzalloc include/linux/slab.h:669 [inline] vc_allocate drivers/tty/vt/vt.c:1085 [inline] vc_allocate+0x1ac/0x680 drivers/tty/vt/vt.c:1066 con_install+0x4d/0x3f0 drivers/tty/vt/vt.c:3229 tty_driver_install_tty drivers/tty/tty_io.c:1228 [inline] tty_init_dev+0x94/0x350 drivers/tty/tty_io.c:1341 tty_open_by_driver drivers/tty/tty_io.c:1987 [inline] tty_open+0x3ca/0xb30 drivers/tty/tty_io.c:2035 [...] Freed by task 130: [...] kfree+0xbf/0x1e0 mm/slab.c:3757 vt_disallocate drivers/tty/vt/vt_ioctl.c:300 [inline] vt_ioctl+0x16dc/0x1e30 drivers/tty/vt/vt_ioctl.c:818 tty_ioctl+0x9db/0x11b0 drivers/tty/tty_io.c:2660 [...] Fixes: 4001d7b ("vt: push down the tty lock so we can see what is left to tackle") Cc: <[email protected]> # v3.4+ Reported-by: [email protected] Acked-by: Jiri Slaby <[email protected]> Signed-off-by: Eric Biggers <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>

[ Upstream commit 1bc7896 ] When experimenting with bpf_send_signal() helper in our production environment (5.2 based), we experienced a deadlock in NMI mode: Freescale#5 [ffffc9002219f770] queued_spin_lock_slowpath at ffffffff8110be24 Freescale#6 [ffffc9002219f770] _raw_spin_lock_irqsave at ffffffff81a43012 Freescale#7 [ffffc9002219f780] try_to_wake_up at ffffffff810e7ecd Freescale#8 [ffffc9002219f7e0] signal_wake_up_state at ffffffff810c7b55 Freescale#9 [ffffc9002219f7f0] __send_signal at ffffffff810c8602 Freescale#10 [ffffc9002219f830] do_send_sig_info at ffffffff810ca31a Freescale#11 [ffffc9002219f868] bpf_send_signal at ffffffff8119d227 Freescale#12 [ffffc9002219f988] bpf_overflow_handler at ffffffff811d4140 Freescale#13 [ffffc9002219f9e0] __perf_event_overflow at ffffffff811d68cf Freescale#14 [ffffc9002219fa10] perf_swevent_overflow at ffffffff811d6a09 Freescale#15 [ffffc9002219fa38] ___perf_sw_event at ffffffff811e0f47 Freescale#16 [ffffc9002219fc30] __schedule at ffffffff81a3e04d Freescale#17 [ffffc9002219fc90] schedule at ffffffff81a3e219 Freescale#18 [ffffc9002219fca0] futex_wait_queue_me at ffffffff8113d1b9 Freescale#19 [ffffc9002219fcd8] futex_wait at ffffffff8113e529 Freescale#20 [ffffc9002219fdf0] do_futex at ffffffff8113ffbc Freescale#21 [ffffc9002219fec0] __x64_sys_futex at ffffffff81140d1c Freescale#22 [ffffc9002219ff38] do_syscall_64 at ffffffff81002602 Freescale#23 [ffffc9002219ff50] entry_SYSCALL_64_after_hwframe at ffffffff81c00068 The above call stack is actually very similar to an issue reported by Commit eac9153 ("bpf/stackmap: Fix deadlock with rq_lock in bpf_get_stack()") by Song Liu. The only difference is bpf_send_signal() helper instead of bpf_get_stack() helper. The above deadlock is triggered with a perf_sw_event. Similar to Commit eac9153, the below almost identical reproducer used tracepoint point sched/sched_switch so the issue can be easily caught. /* stress_test.c */ #include <stdio.h> #include <stdlib.h> #include <sys/mman.h> #include <pthread.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #define THREAD_COUNT 1000 char *filename; void *worker(void *p) { void *ptr; int fd; char *pptr; fd = open(filename, O_RDONLY); if (fd < 0) return NULL; while (1) { struct timespec ts = {0, 1000 + rand() % 2000}; ptr = mmap(NULL, 4096 * 64, PROT_READ, MAP_PRIVATE, fd, 0); usleep(1); if (ptr == MAP_FAILED) { printf("failed to mmap\n"); break; } munmap(ptr, 4096 * 64); usleep(1); pptr = malloc(1); usleep(1); pptr[0] = 1; usleep(1); free(pptr); usleep(1); nanosleep(&ts, NULL); } close(fd); return NULL; } int main(int argc, char *argv[]) { void *ptr; int i; pthread_t threads[THREAD_COUNT]; if (argc < 2) return 0; filename = argv[1]; for (i = 0; i < THREAD_COUNT; i++) { if (pthread_create(threads + i, NULL, worker, NULL)) { fprintf(stderr, "Error creating thread\n"); return 0; } } for (i = 0; i < THREAD_COUNT; i++) pthread_join(threads[i], NULL); return 0; } and the following command: 1. run `stress_test /bin/ls` in one windown 2. hack bcc trace.py with the following change: # --- a/tools/trace.py # +++ b/tools/trace.py @@ -513,6 +513,7 @@ BPF_PERF_OUTPUT(%s); __data.tgid = __tgid; __data.pid = __pid; bpf_get_current_comm(&__data.comm, sizeof(__data.comm)); + bpf_send_signal(10); %s %s %s.perf_submit(%s, &__data, sizeof(__data)); 3. in a different window run ./trace.py -p $(pidof stress_test) t:sched:sched_switch The deadlock can be reproduced in our production system. Similar to Song's fix, the fix is to delay sending signal if irqs is disabled to avoid deadlocks involving with rq_lock. With this change, my above stress-test in our production system won't cause deadlock any more. I also implemented a scale-down version of reproducer in the selftest (a subsequent commit). With latest bpf-next, it complains for the following potential deadlock. [ 32.832450] -> Freescale#1 (&p->pi_lock){-.-.}: [ 32.833100] _raw_spin_lock_irqsave+0x44/0x80 [ 32.833696] task_rq_lock+0x2c/0xa0 [ 32.834182] task_sched_runtime+0x59/0xd0 [ 32.834721] thread_group_cputime+0x250/0x270 [ 32.835304] thread_group_cputime_adjusted+0x2e/0x70 [ 32.835959] do_task_stat+0x8a7/0xb80 [ 32.836461] proc_single_show+0x51/0xb0 ... [ 32.839512] -> #0 (&(&sighand->siglock)->rlock){....}: [ 32.840275] __lock_acquire+0x1358/0x1a20 [ 32.840826] lock_acquire+0xc7/0x1d0 [ 32.841309] _raw_spin_lock_irqsave+0x44/0x80 [ 32.841916] __lock_task_sighand+0x79/0x160 [ 32.842465] do_send_sig_info+0x35/0x90 [ 32.842977] bpf_send_signal+0xa/0x10 [ 32.843464] bpf_prog_bc13ed9e4d3163e3_send_signal_tp_sched+0x465/0x1000 [ 32.844301] trace_call_bpf+0x115/0x270 [ 32.844809] perf_trace_run_bpf_submit+0x4a/0xc0 [ 32.845411] perf_trace_sched_switch+0x10f/0x180 [ 32.846014] __schedule+0x45d/0x880 [ 32.846483] schedule+0x5f/0xd0 ... [ 32.853148] Chain exists of: [ 32.853148] &(&sighand->siglock)->rlock --> &p->pi_lock --> &rq->lock [ 32.853148] [ 32.854451] Possible unsafe locking scenario: [ 32.854451] [ 32.855173] CPU0 CPU1 [ 32.855745] ---- ---- [ 32.856278] lock(&rq->lock); [ 32.856671] lock(&p->pi_lock); [ 32.857332] lock(&rq->lock); [ 32.857999] lock(&(&sighand->siglock)->rlock); Deadlock happens on CPU0 when it tries to acquire &sighand->siglock but it has been held by CPU1 and CPU1 tries to grab &rq->lock and cannot get it. This is not exactly the callstack in our production environment, but sympotom is similar and both locks are using spin_lock_irqsave() to acquire the lock, and both involves rq_lock. The fix to delay sending signal when irq is disabled also fixed this issue. Signed-off-by: Yonghong Song <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Cc: Song Liu <[email protected]> Link: https://lore.kernel.org/bpf/[email protected] Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit ca4463b upstream. The VT_DISALLOCATE ioctl can free a virtual console while tty_release() is still running, causing a use-after-free in con_shutdown(). This occurs because VT_DISALLOCATE considers a virtual console's 'struct vc_data' to be unused as soon as the corresponding tty's refcount hits 0. But actually it may be still being closed. Fix this by making vc_data be reference-counted via the embedded 'struct tty_port'. A newly allocated virtual console has refcount 1. Opening it for the first time increments the refcount to 2. Closing it for the last time decrements the refcount (in tty_operations::cleanup() so that it happens late enough), as does VT_DISALLOCATE. Reproducer: #include <fcntl.h> #include <linux/vt.h> #include <sys/ioctl.h> #include <unistd.h> int main() { if (fork()) { for (;;) close(open("/dev/tty5", O_RDWR)); } else { int fd = open("/dev/tty10", O_RDWR); for (;;) ioctl(fd, VT_DISALLOCATE, 5); } } KASAN report: BUG: KASAN: use-after-free in con_shutdown+0x76/0x80 drivers/tty/vt/vt.c:3278 Write of size 8 at addr ffff88806a4ec108 by task syz_vt/129 CPU: 0 PID: 129 Comm: syz_vt Not tainted 5.6.0-rc2 #11 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20191223_100556-anatol 04/01/2014 Call Trace: [...] con_shutdown+0x76/0x80 drivers/tty/vt/vt.c:3278 release_tty+0xa8/0x410 drivers/tty/tty_io.c:1514 tty_release_struct+0x34/0x50 drivers/tty/tty_io.c:1629 tty_release+0x984/0xed0 drivers/tty/tty_io.c:1789 [...] Allocated by task 129: [...] kzalloc include/linux/slab.h:669 [inline] vc_allocate drivers/tty/vt/vt.c:1085 [inline] vc_allocate+0x1ac/0x680 drivers/tty/vt/vt.c:1066 con_install+0x4d/0x3f0 drivers/tty/vt/vt.c:3229 tty_driver_install_tty drivers/tty/tty_io.c:1228 [inline] tty_init_dev+0x94/0x350 drivers/tty/tty_io.c:1341 tty_open_by_driver drivers/tty/tty_io.c:1987 [inline] tty_open+0x3ca/0xb30 drivers/tty/tty_io.c:2035 [...] Freed by task 130: [...] kfree+0xbf/0x1e0 mm/slab.c:3757 vt_disallocate drivers/tty/vt/vt_ioctl.c:300 [inline] vt_ioctl+0x16dc/0x1e30 drivers/tty/vt/vt_ioctl.c:818 tty_ioctl+0x9db/0x11b0 drivers/tty/tty_io.c:2660 [...] Fixes: 4001d7b ("vt: push down the tty lock so we can see what is left to tackle") Cc: <[email protected]> # v3.4+ Reported-by: [email protected] Acked-by: Jiri Slaby <[email protected]> Signed-off-by: Eric Biggers <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>

[ Upstream commit 18f855e ] Stefano reported a crash with using SQPOLL with io_uring: BUG: kernel NULL pointer dereference, address: 00000000000003b0 CPU: 2 PID: 1307 Comm: io_uring-sq Not tainted 5.7.0-rc7 Freescale#11 RIP: 0010:task_numa_work+0x4f/0x2c0 Call Trace: task_work_run+0x68/0xa0 io_sq_thread+0x252/0x3d0 kthread+0xf9/0x130 ret_from_fork+0x35/0x40 which is task_numa_work() oopsing on current->mm being NULL. The task work is queued by task_tick_numa(), which checks if current->mm is NULL at the time of the call. But this state isn't necessarily persistent, if the kthread is using use_mm() to temporarily adopt the mm of a task. Change the task_tick_numa() check to exclude kernel threads in general, as it doesn't make sense to attempt ot balance for kthreads anyway. Reported-by: Stefano Garzarella <[email protected]> Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sasha Levin <[email protected]>

[ Upstream commit e24c644 ] I compiled with AddressSanitizer and I had these memory leaks while I was using the tep_parse_format function: Direct leak of 28 byte(s) in 4 object(s) allocated from: #0 0x7fb07db49ffe in __interceptor_realloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dffe) Freescale#1 0x7fb07a724228 in extend_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:985 Freescale#2 0x7fb07a724c21 in __read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1140 Freescale#3 0x7fb07a724f78 in read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1206 Freescale#4 0x7fb07a725191 in __read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1291 Freescale#5 0x7fb07a7251df in read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1299 Freescale#6 0x7fb07a72e6c8 in process_dynamic_array_len /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:2849 Freescale#7 0x7fb07a7304b8 in process_function /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3161 Freescale#8 0x7fb07a730900 in process_arg_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3207 Freescale#9 0x7fb07a727c0b in process_arg /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1786 Freescale#10 0x7fb07a731080 in event_read_print_args /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3285 Freescale#11 0x7fb07a731722 in event_read_print /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3369 Freescale#12 0x7fb07a740054 in __tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6335 Freescale#13 0x7fb07a74047a in __parse_event /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6389 Freescale#14 0x7fb07a740536 in tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6431 Freescale#15 0x7fb07a785acf in parse_event ../../../src/fs-src/fs.c:251 Freescale#16 0x7fb07a785ccd in parse_systems ../../../src/fs-src/fs.c:284 Freescale#17 0x7fb07a786fb3 in read_metadata ../../../src/fs-src/fs.c:593 Freescale#18 0x7fb07a78760e in ftrace_fs_source_init ../../../src/fs-src/fs.c:727 Freescale#19 0x7fb07d90c19c in add_component_with_init_method_data ../../../../src/lib/graph/graph.c:1048 Freescale#20 0x7fb07d90c87b in add_source_component_with_initialize_method_data ../../../../src/lib/graph/graph.c:1127 Freescale#21 0x7fb07d90c92a in bt_graph_add_source_component ../../../../src/lib/graph/graph.c:1152 Freescale#22 0x55db11aa632e in cmd_run_ctx_create_components_from_config_components ../../../src/cli/babeltrace2.c:2252 Freescale#23 0x55db11aa6fda in cmd_run_ctx_create_components ../../../src/cli/babeltrace2.c:2347 Freescale#24 0x55db11aa780c in cmd_run ../../../src/cli/babeltrace2.c:2461 Freescale#25 0x55db11aa8a7d in main ../../../src/cli/babeltrace2.c:2673 Freescale#26 0x7fb07d5460b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2) The token variable in the process_dynamic_array_len function is allocated in the read_expect_type function, but is not freed before calling the read_token function. Free the token variable before calling read_token in order to plug the leak. Signed-off-by: Philippe Duplessis-Guindon <[email protected]> Reviewed-by: Steven Rostedt (VMware) <[email protected]> Link: https://lore.kernel.org/linux-trace-devel/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

[ Upstream commit b12eea5 ] The evsel->unit borrows a pointer of pmu event or alias instead of owns a string. But tool event (duration_time) passes a result of strdup() caused a leak. It was found by ASAN during metric test: Direct leak of 210 byte(s) in 70 object(s) allocated from: #0 0x7fe366fca0b5 in strdup (/lib/x86_64-linux-gnu/libasan.so.5+0x920b5) Freescale#1 0x559fbbcc6ea3 in add_event_tool util/parse-events.c:414 Freescale#2 0x559fbbcc6ea3 in parse_events_add_tool util/parse-events.c:1414 Freescale#3 0x559fbbd8474d in parse_events_parse util/parse-events.y:439 Freescale#4 0x559fbbcc95da in parse_events__scanner util/parse-events.c:2096 Freescale#5 0x559fbbcc95da in __parse_events util/parse-events.c:2141 Freescale#6 0x559fbbc28555 in check_parse_id tests/pmu-events.c:406 Freescale#7 0x559fbbc28555 in check_parse_id tests/pmu-events.c:393 Freescale#8 0x559fbbc28555 in check_parse_cpu tests/pmu-events.c:415 Freescale#9 0x559fbbc28555 in test_parsing tests/pmu-events.c:498 Freescale#10 0x559fbbc0109b in run_test tests/builtin-test.c:410 Freescale#11 0x559fbbc0109b in test_and_print tests/builtin-test.c:440 Freescale#12 0x559fbbc03e69 in __cmd_test tests/builtin-test.c:695 Freescale#13 0x559fbbc03e69 in cmd_test tests/builtin-test.c:807 Freescale#14 0x559fbbc691f4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312 Freescale#15 0x559fbbb071a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364 Freescale#16 0x559fbbb071a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408 Freescale#17 0x559fbbb071a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538 Freescale#18 0x7fe366b68cc9 in __libc_start_main ../csu/libc-start.c:308 Fixes: f0fbb11 ("perf stat: Implement duration_time as a proper event") Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

[ Upstream commit d26383d ] The following leaks were detected by ASAN: Indirect leak of 360 byte(s) in 9 object(s) allocated from: #0 0x7fecc305180e in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10780e) Freescale#1 0x560578f6dce5 in perf_pmu__new_format util/pmu.c:1333 Freescale#2 0x560578f752fc in perf_pmu_parse util/pmu.y:59 Freescale#3 0x560578f6a8b7 in perf_pmu__format_parse util/pmu.c:73 Freescale#4 0x560578e07045 in test__pmu tests/pmu.c:155 Freescale#5 0x560578de109b in run_test tests/builtin-test.c:410 Freescale#6 0x560578de109b in test_and_print tests/builtin-test.c:440 Freescale#7 0x560578de401a in __cmd_test tests/builtin-test.c:661 Freescale#8 0x560578de401a in cmd_test tests/builtin-test.c:807 Freescale#9 0x560578e49354 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312 Freescale#10 0x560578ce71a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364 Freescale#11 0x560578ce71a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408 Freescale#12 0x560578ce71a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538 Freescale#13 0x7fecc2b7acc9 in __libc_start_main ../csu/libc-start.c:308 Fixes: cff7f95 ("perf tests: Move pmu tests into separate object") Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

commit ca4463b upstream. The VT_DISALLOCATE ioctl can free a virtual console while tty_release() is still running, causing a use-after-free in con_shutdown(). This occurs because VT_DISALLOCATE considers a virtual console's 'struct vc_data' to be unused as soon as the corresponding tty's refcount hits 0. But actually it may be still being closed. Fix this by making vc_data be reference-counted via the embedded 'struct tty_port'. A newly allocated virtual console has refcount 1. Opening it for the first time increments the refcount to 2. Closing it for the last time decrements the refcount (in tty_operations::cleanup() so that it happens late enough), as does VT_DISALLOCATE. Reproducer: #include <fcntl.h> #include <linux/vt.h> #include <sys/ioctl.h> #include <unistd.h> int main() { if (fork()) { for (;;) close(open("/dev/tty5", O_RDWR)); } else { int fd = open("/dev/tty10", O_RDWR); for (;;) ioctl(fd, VT_DISALLOCATE, 5); } } KASAN report: BUG: KASAN: use-after-free in con_shutdown+0x76/0x80 drivers/tty/vt/vt.c:3278 Write of size 8 at addr ffff88806a4ec108 by task syz_vt/129 CPU: 0 PID: 129 Comm: syz_vt Not tainted 5.6.0-rc2 Freescale#11 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20191223_100556-anatol 04/01/2014 Call Trace: [...] con_shutdown+0x76/0x80 drivers/tty/vt/vt.c:3278 release_tty+0xa8/0x410 drivers/tty/tty_io.c:1514 tty_release_struct+0x34/0x50 drivers/tty/tty_io.c:1629 tty_release+0x984/0xed0 drivers/tty/tty_io.c:1789 [...] Allocated by task 129: [...] kzalloc include/linux/slab.h:669 [inline] vc_allocate drivers/tty/vt/vt.c:1085 [inline] vc_allocate+0x1ac/0x680 drivers/tty/vt/vt.c:1066 con_install+0x4d/0x3f0 drivers/tty/vt/vt.c:3229 tty_driver_install_tty drivers/tty/tty_io.c:1228 [inline] tty_init_dev+0x94/0x350 drivers/tty/tty_io.c:1341 tty_open_by_driver drivers/tty/tty_io.c:1987 [inline] tty_open+0x3ca/0xb30 drivers/tty/tty_io.c:2035 [...] Freed by task 130: [...] kfree+0xbf/0x1e0 mm/slab.c:3757 vt_disallocate drivers/tty/vt/vt_ioctl.c:300 [inline] vt_ioctl+0x16dc/0x1e30 drivers/tty/vt/vt_ioctl.c:818 tty_ioctl+0x9db/0x11b0 drivers/tty/tty_io.c:2660 [...] Fixes: 4001d7b ("vt: push down the tty lock so we can see what is left to tackle") Cc: <[email protected]> # v3.4+ Reported-by: [email protected] Acked-by: Jiri Slaby <[email protected]> Signed-off-by: Eric Biggers <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>

[ Upstream commit 18f855e ] Stefano reported a crash with using SQPOLL with io_uring: BUG: kernel NULL pointer dereference, address: 00000000000003b0 CPU: 2 PID: 1307 Comm: io_uring-sq Not tainted 5.7.0-rc7 Freescale#11 RIP: 0010:task_numa_work+0x4f/0x2c0 Call Trace: task_work_run+0x68/0xa0 io_sq_thread+0x252/0x3d0 kthread+0xf9/0x130 ret_from_fork+0x35/0x40 which is task_numa_work() oopsing on current->mm being NULL. The task work is queued by task_tick_numa(), which checks if current->mm is NULL at the time of the call. But this state isn't necessarily persistent, if the kthread is using use_mm() to temporarily adopt the mm of a task. Change the task_tick_numa() check to exclude kernel threads in general, as it doesn't make sense to attempt ot balance for kthreads anyway. Reported-by: Stefano Garzarella <[email protected]> Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sasha Levin <[email protected]>

[ Upstream commit e24c644 ] I compiled with AddressSanitizer and I had these memory leaks while I was using the tep_parse_format function: Direct leak of 28 byte(s) in 4 object(s) allocated from: #0 0x7fb07db49ffe in __interceptor_realloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dffe) Freescale#1 0x7fb07a724228 in extend_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:985 Freescale#2 0x7fb07a724c21 in __read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1140 Freescale#3 0x7fb07a724f78 in read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1206 Freescale#4 0x7fb07a725191 in __read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1291 Freescale#5 0x7fb07a7251df in read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1299 Freescale#6 0x7fb07a72e6c8 in process_dynamic_array_len /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:2849 Freescale#7 0x7fb07a7304b8 in process_function /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3161 Freescale#8 0x7fb07a730900 in process_arg_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3207 Freescale#9 0x7fb07a727c0b in process_arg /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1786 Freescale#10 0x7fb07a731080 in event_read_print_args /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3285 Freescale#11 0x7fb07a731722 in event_read_print /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3369 Freescale#12 0x7fb07a740054 in __tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6335 Freescale#13 0x7fb07a74047a in __parse_event /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6389 Freescale#14 0x7fb07a740536 in tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6431 Freescale#15 0x7fb07a785acf in parse_event ../../../src/fs-src/fs.c:251 Freescale#16 0x7fb07a785ccd in parse_systems ../../../src/fs-src/fs.c:284 Freescale#17 0x7fb07a786fb3 in read_metadata ../../../src/fs-src/fs.c:593 Freescale#18 0x7fb07a78760e in ftrace_fs_source_init ../../../src/fs-src/fs.c:727 Freescale#19 0x7fb07d90c19c in add_component_with_init_method_data ../../../../src/lib/graph/graph.c:1048 Freescale#20 0x7fb07d90c87b in add_source_component_with_initialize_method_data ../../../../src/lib/graph/graph.c:1127 Freescale#21 0x7fb07d90c92a in bt_graph_add_source_component ../../../../src/lib/graph/graph.c:1152 Freescale#22 0x55db11aa632e in cmd_run_ctx_create_components_from_config_components ../../../src/cli/babeltrace2.c:2252 Freescale#23 0x55db11aa6fda in cmd_run_ctx_create_components ../../../src/cli/babeltrace2.c:2347 Freescale#24 0x55db11aa780c in cmd_run ../../../src/cli/babeltrace2.c:2461 Freescale#25 0x55db11aa8a7d in main ../../../src/cli/babeltrace2.c:2673 Freescale#26 0x7fb07d5460b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2) The token variable in the process_dynamic_array_len function is allocated in the read_expect_type function, but is not freed before calling the read_token function. Free the token variable before calling read_token in order to plug the leak. Signed-off-by: Philippe Duplessis-Guindon <[email protected]> Reviewed-by: Steven Rostedt (VMware) <[email protected]> Link: https://lore.kernel.org/linux-trace-devel/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

[ Upstream commit d26383d ] The following leaks were detected by ASAN: Indirect leak of 360 byte(s) in 9 object(s) allocated from: #0 0x7fecc305180e in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10780e) Freescale#1 0x560578f6dce5 in perf_pmu__new_format util/pmu.c:1333 Freescale#2 0x560578f752fc in perf_pmu_parse util/pmu.y:59 Freescale#3 0x560578f6a8b7 in perf_pmu__format_parse util/pmu.c:73 Freescale#4 0x560578e07045 in test__pmu tests/pmu.c:155 Freescale#5 0x560578de109b in run_test tests/builtin-test.c:410 Freescale#6 0x560578de109b in test_and_print tests/builtin-test.c:440 Freescale#7 0x560578de401a in __cmd_test tests/builtin-test.c:661 Freescale#8 0x560578de401a in cmd_test tests/builtin-test.c:807 Freescale#9 0x560578e49354 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312 Freescale#10 0x560578ce71a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364 Freescale#11 0x560578ce71a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408 Freescale#12 0x560578ce71a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538 Freescale#13 0x7fecc2b7acc9 in __libc_start_main ../csu/libc-start.c:308 Fixes: cff7f95 ("perf tests: Move pmu tests into separate object") Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

commit 66d204a upstream. Very sporadically I had test case btrfs/069 from fstests hanging (for years, it is not a recent regression), with the following traces in dmesg/syslog: [162301.160628] BTRFS info (device sdc): dev_replace from /dev/sdd (devid 2) to /dev/sdg started [162301.181196] BTRFS info (device sdc): scrub: finished on devid 4 with status: 0 [162301.287162] BTRFS info (device sdc): dev_replace from /dev/sdd (devid 2) to /dev/sdg finished [162513.513792] INFO: task btrfs-transacti:1356167 blocked for more than 120 seconds. [162513.514318] Not tainted 5.9.0-rc6-btrfs-next-69 Freescale#1 [162513.514522] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [162513.514747] task:btrfs-transacti state:D stack: 0 pid:1356167 ppid: 2 flags:0x00004000 [162513.514751] Call Trace: [162513.514761] __schedule+0x5ce/0xd00 [162513.514765] ? _raw_spin_unlock_irqrestore+0x3c/0x60 [162513.514771] schedule+0x46/0xf0 [162513.514844] wait_current_trans+0xde/0x140 [btrfs] [162513.514850] ? finish_wait+0x90/0x90 [162513.514864] start_transaction+0x37c/0x5f0 [btrfs] [162513.514879] transaction_kthread+0xa4/0x170 [btrfs] [162513.514891] ? btrfs_cleanup_transaction+0x660/0x660 [btrfs] [162513.514894] kthread+0x153/0x170 [162513.514897] ? kthread_stop+0x2c0/0x2c0 [162513.514902] ret_from_fork+0x22/0x30 [162513.514916] INFO: task fsstress:1356184 blocked for more than 120 seconds. [162513.515192] Not tainted 5.9.0-rc6-btrfs-next-69 Freescale#1 [162513.515431] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [162513.515680] task:fsstress state:D stack: 0 pid:1356184 ppid:1356177 flags:0x00004000 [162513.515682] Call Trace: [162513.515688] __schedule+0x5ce/0xd00 [162513.515691] ? _raw_spin_unlock_irqrestore+0x3c/0x60 [162513.515697] schedule+0x46/0xf0 [162513.515712] wait_current_trans+0xde/0x140 [btrfs] [162513.515716] ? finish_wait+0x90/0x90 [162513.515729] start_transaction+0x37c/0x5f0 [btrfs] [162513.515743] btrfs_attach_transaction_barrier+0x1f/0x50 [btrfs] [162513.515753] btrfs_sync_fs+0x61/0x1c0 [btrfs] [162513.515758] ? __ia32_sys_fdatasync+0x20/0x20 [162513.515761] iterate_supers+0x87/0xf0 [162513.515765] ksys_sync+0x60/0xb0 [162513.515768] __do_sys_sync+0xa/0x10 [162513.515771] do_syscall_64+0x33/0x80 [162513.515774] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [162513.515781] RIP: 0033:0x7f5238f50bd7 [162513.515782] Code: Bad RIP value. [162513.515784] RSP: 002b:00007fff67b978e8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a2 [162513.515786] RAX: ffffffffffffffda RBX: 000055b1fad2c560 RCX: 00007f5238f50bd7 [162513.515788] RDX: 00000000ffffffff RSI: 000000000daf0e74 RDI: 000000000000003a [162513.515789] RBP: 0000000000000032 R08: 000000000000000a R09: 00007f5239019be0 [162513.515791] R10: fffffffffffff24f R11: 0000000000000206 R12: 000000000000003a [162513.515792] R13: 00007fff67b97950 R14: 00007fff67b97906 R15: 000055b1fad1a340 [162513.515804] INFO: task fsstress:1356185 blocked for more than 120 seconds. [162513.516064] Not tainted 5.9.0-rc6-btrfs-next-69 Freescale#1 [162513.516329] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [162513.516617] task:fsstress state:D stack: 0 pid:1356185 ppid:1356177 flags:0x00000000 [162513.516620] Call Trace: [162513.516625] __schedule+0x5ce/0xd00 [162513.516628] ? _raw_spin_unlock_irqrestore+0x3c/0x60 [162513.516634] schedule+0x46/0xf0 [162513.516647] wait_current_trans+0xde/0x140 [btrfs] [162513.516650] ? finish_wait+0x90/0x90 [162513.516662] start_transaction+0x4d7/0x5f0 [btrfs] [162513.516679] btrfs_setxattr_trans+0x3c/0x100 [btrfs] [162513.516686] __vfs_setxattr+0x66/0x80 [162513.516691] __vfs_setxattr_noperm+0x70/0x200 [162513.516697] vfs_setxattr+0x6b/0x120 [162513.516703] setxattr+0x125/0x240 [162513.516709] ? lock_acquire+0xb1/0x480 [162513.516712] ? mnt_want_write+0x20/0x50 [162513.516721] ? rcu_read_lock_any_held+0x8e/0xb0 [162513.516723] ? preempt_count_add+0x49/0xa0 [162513.516725] ? __sb_start_write+0x19b/0x290 [162513.516727] ? preempt_count_add+0x49/0xa0 [162513.516732] path_setxattr+0xba/0xd0 [162513.516739] __x64_sys_setxattr+0x27/0x30 [162513.516741] do_syscall_64+0x33/0x80 [162513.516743] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [162513.516745] RIP: 0033:0x7f5238f56d5a [162513.516746] Code: Bad RIP value. [162513.516748] RSP: 002b:00007fff67b97868 EFLAGS: 00000202 ORIG_RAX: 00000000000000bc [162513.516750] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f5238f56d5a [162513.516751] RDX: 000055b1fbb0d5a0 RSI: 00007fff67b978a0 RDI: 000055b1fbb0d470 [162513.516753] RBP: 000055b1fbb0d5a0 R08: 0000000000000001 R09: 00007fff67b97700 [162513.516754] R10: 0000000000000004 R11: 0000000000000202 R12: 0000000000000004 [162513.516756] R13: 0000000000000024 R14: 0000000000000001 R15: 00007fff67b978a0 [162513.516767] INFO: task fsstress:1356196 blocked for more than 120 seconds. [162513.517064] Not tainted 5.9.0-rc6-btrfs-next-69 Freescale#1 [162513.517365] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [162513.517763] task:fsstress state:D stack: 0 pid:1356196 ppid:1356177 flags:0x00004000 [162513.517780] Call Trace: [162513.517786] __schedule+0x5ce/0xd00 [162513.517789] ? _raw_spin_unlock_irqrestore+0x3c/0x60 [162513.517796] schedule+0x46/0xf0 [162513.517810] wait_current_trans+0xde/0x140 [btrfs] [162513.517814] ? finish_wait+0x90/0x90 [162513.517829] start_transaction+0x37c/0x5f0 [btrfs] [162513.517845] btrfs_attach_transaction_barrier+0x1f/0x50 [btrfs] [162513.517857] btrfs_sync_fs+0x61/0x1c0 [btrfs] [162513.517862] ? __ia32_sys_fdatasync+0x20/0x20 [162513.517865] iterate_supers+0x87/0xf0 [162513.517869] ksys_sync+0x60/0xb0 [162513.517872] __do_sys_sync+0xa/0x10 [162513.517875] do_syscall_64+0x33/0x80 [162513.517878] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [162513.517881] RIP: 0033:0x7f5238f50bd7 [162513.517883] Code: Bad RIP value. [162513.517885] RSP: 002b:00007fff67b978e8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a2 [162513.517887] RAX: ffffffffffffffda RBX: 000055b1fad2c560 RCX: 00007f5238f50bd7 [162513.517889] RDX: 0000000000000000 RSI: 000000007660add2 RDI: 0000000000000053 [162513.517891] RBP: 0000000000000032 R08: 0000000000000067 R09: 00007f5239019be0 [162513.517893] R10: fffffffffffff24f R11: 0000000000000206 R12: 0000000000000053 [162513.517895] R13: 00007fff67b97950 R14: 00007fff67b97906 R15: 000055b1fad1a340 [162513.517908] INFO: task fsstress:1356197 blocked for more than 120 seconds. [162513.518298] Not tainted 5.9.0-rc6-btrfs-next-69 Freescale#1 [162513.518672] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [162513.519157] task:fsstress state:D stack: 0 pid:1356197 ppid:1356177 flags:0x00000000 [162513.519160] Call Trace: [162513.519165] __schedule+0x5ce/0xd00 [162513.519168] ? _raw_spin_unlock_irqrestore+0x3c/0x60 [162513.519174] schedule+0x46/0xf0 [162513.519190] wait_current_trans+0xde/0x140 [btrfs] [162513.519193] ? finish_wait+0x90/0x90 [162513.519206] start_transaction+0x4d7/0x5f0 [btrfs] [162513.519222] btrfs_create+0x57/0x200 [btrfs] [162513.519230] lookup_open+0x522/0x650 [162513.519246] path_openat+0x2b8/0xa50 [162513.519270] do_filp_open+0x91/0x100 [162513.519275] ? find_held_lock+0x32/0x90 [162513.519280] ? lock_acquired+0x33b/0x470 [162513.519285] ? do_raw_spin_unlock+0x4b/0xc0 [162513.519287] ? _raw_spin_unlock+0x29/0x40 [162513.519295] do_sys_openat2+0x20d/0x2d0 [162513.519300] do_sys_open+0x44/0x80 [162513.519304] do_syscall_64+0x33/0x80 [162513.519307] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [162513.519309] RIP: 0033:0x7f5238f4a903 [162513.519310] Code: Bad RIP value. [162513.519312] RSP: 002b:00007fff67b97758 EFLAGS: 00000246 ORIG_RAX: 0000000000000055 [162513.519314] RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007f5238f4a903 [162513.519316] RDX: 0000000000000000 RSI: 00000000000001b6 RDI: 000055b1fbb0d470 [162513.519317] RBP: 00007fff67b978c0 R08: 0000000000000001 R09: 0000000000000002 [162513.519319] R10: 00007fff67b974f7 R11: 0000000000000246 R12: 0000000000000013 [162513.519320] R13: 00000000000001b6 R14: 00007fff67b97906 R15: 000055b1fad1c620 [162513.519332] INFO: task btrfs:1356211 blocked for more than 120 seconds. [162513.519727] Not tainted 5.9.0-rc6-btrfs-next-69 Freescale#1 [162513.520115] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [162513.520508] task:btrfs state:D stack: 0 pid:1356211 ppid:1356178 flags:0x00004002 [162513.520511] Call Trace: [162513.520516] __schedule+0x5ce/0xd00 [162513.520519] ? _raw_spin_unlock_irqrestore+0x3c/0x60 [162513.520525] schedule+0x46/0xf0 [162513.520544] btrfs_scrub_pause+0x11f/0x180 [btrfs] [162513.520548] ? finish_wait+0x90/0x90 [162513.520562] btrfs_commit_transaction+0x45a/0xc30 [btrfs] [162513.520574] ? start_transaction+0xe0/0x5f0 [btrfs] [162513.520596] btrfs_dev_replace_finishing+0x6d8/0x711 [btrfs] [162513.520619] btrfs_dev_replace_by_ioctl.cold+0x1cc/0x1fd [btrfs] [162513.520639] btrfs_ioctl+0x2a25/0x36f0 [btrfs] [162513.520643] ? do_sigaction+0xf3/0x240 [162513.520645] ? find_held_lock+0x32/0x90 [162513.520648] ? do_sigaction+0xf3/0x240 [162513.520651] ? lock_acquired+0x33b/0x470 [162513.520655] ? _raw_spin_unlock_irq+0x24/0x50 [162513.520657] ? lockdep_hardirqs_on+0x7d/0x100 [162513.520660] ? _raw_spin_unlock_irq+0x35/0x50 [162513.520662] ? do_sigaction+0xf3/0x240 [162513.520671] ? __x64_sys_ioctl+0x83/0xb0 [162513.520672] __x64_sys_ioctl+0x83/0xb0 [162513.520677] do_syscall_64+0x33/0x80 [162513.520679] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [162513.520681] RIP: 0033:0x7fc3cd307d87 [162513.520682] Code: Bad RIP value. [162513.520684] RSP: 002b:00007ffe30a56bb8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 [162513.520686] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fc3cd307d87 [162513.520687] RDX: 00007ffe30a57a30 RSI: 00000000ca289435 RDI: 0000000000000003 [162513.520689] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 [162513.520690] R10: 0000000000000008 R11: 0000000000000202 R12: 0000000000000003 [162513.520692] R13: 0000557323a212e0 R14: 00007ffe30a5a520 R15: 0000000000000001 [162513.520703] Showing all locks held in the system: [162513.520712] 1 lock held by khungtaskd/54: [162513.520713] #0: ffffffffb40a91a0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x15/0x197 [162513.520728] 1 lock held by in:imklog/596: [162513.520729] #0: ffff8f3f0d781400 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0x4d/0x60 [162513.520782] 1 lock held by btrfs-transacti/1356167: [162513.520784] #0: ffff8f3d810cc848 (&fs_info->transaction_kthread_mutex){+.+.}-{3:3}, at: transaction_kthread+0x4a/0x170 [btrfs] [162513.520798] 1 lock held by btrfs/1356190: [162513.520800] #0: ffff8f3d57644470 (sb_writers#15){.+.+}-{0:0}, at: mnt_want_write_file+0x22/0x60 [162513.520805] 1 lock held by fsstress/1356184: [162513.520806] #0: ffff8f3d576440e8 (&type->s_umount_key#62){++++}-{3:3}, at: iterate_supers+0x6f/0xf0 [162513.520811] 3 locks held by fsstress/1356185: [162513.520812] #0: ffff8f3d57644470 (sb_writers#15){.+.+}-{0:0}, at: mnt_want_write+0x20/0x50 [162513.520815] Freescale#1: ffff8f3d80a650b8 (&type->i_mutex_dir_key#10){++++}-{3:3}, at: vfs_setxattr+0x50/0x120 [162513.520820] Freescale#2: ffff8f3d57644690 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x40e/0x5f0 [btrfs] [162513.520833] 1 lock held by fsstress/1356196: [162513.520834] #0: ffff8f3d576440e8 (&type->s_umount_key#62){++++}-{3:3}, at: iterate_supers+0x6f/0xf0 [162513.520838] 3 locks held by fsstress/1356197: [162513.520839] #0: ffff8f3d57644470 (sb_writers#15){.+.+}-{0:0}, at: mnt_want_write+0x20/0x50 [162513.520843] Freescale#1: ffff8f3d506465e8 (&type->i_mutex_dir_key#10){++++}-{3:3}, at: path_openat+0x2a7/0xa50 [162513.520846] Freescale#2: ffff8f3d57644690 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x40e/0x5f0 [btrfs] [162513.520858] 2 locks held by btrfs/1356211: [162513.520859] #0: ffff8f3d810cde30 (&fs_info->dev_replace.lock_finishing_cancel_unmount){+.+.}-{3:3}, at: btrfs_dev_replace_finishing+0x52/0x711 [btrfs] [162513.520877] Freescale#1: ffff8f3d57644690 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x40e/0x5f0 [btrfs] This was weird because the stack traces show that a transaction commit, triggered by a device replace operation, is blocking trying to pause any running scrubs but there are no stack traces of blocked tasks doing a scrub. After poking around with drgn, I noticed there was a scrub task that was constantly running and blocking for shorts periods of time: >>> t = find_task(prog, 1356190) >>> prog.stack_trace(t) #0 __schedule+0x5ce/0xcfc Freescale#1 schedule+0x46/0xe4 Freescale#2 schedule_timeout+0x1df/0x475 Freescale#3 btrfs_reada_wait+0xda/0x132 Freescale#4 scrub_stripe+0x2a8/0x112f Freescale#5 scrub_chunk+0xcd/0x134 Freescale#6 scrub_enumerate_chunks+0x29e/0x5ee Freescale#7 btrfs_scrub_dev+0x2d5/0x91b Freescale#8 btrfs_ioctl+0x7f5/0x36e7 Freescale#9 __x64_sys_ioctl+0x83/0xb0 Freescale#10 do_syscall_64+0x33/0x77 Freescale#11 entry_SYSCALL_64+0x7c/0x156 Which corresponds to: int btrfs_reada_wait(void *handle) { struct reada_control *rc = handle; struct btrfs_fs_info *fs_info = rc->fs_info; while (atomic_read(&rc->elems)) { if (!atomic_read(&fs_info->reada_works_cnt)) reada_start_machine(fs_info); wait_event_timeout(rc->wait, atomic_read(&rc->elems) == 0, (HZ + 9) / 10); } (...) So the counter "rc->elems" was set to 1 and never decreased to 0, causing the scrub task to loop forever in that function. Then I used the following script for drgn to check the readahead requests: $ cat dump_reada.py import sys import drgn from drgn import NULL, Object, cast, container_of, execscript, \ reinterpret, sizeof from drgn.helpers.linux import * mnt_path = b"/home/fdmanana/btrfs-tests/scratch_1" mnt = None for mnt in for_each_mount(prog, dst = mnt_path): pass if mnt is None: sys.stderr.write(f'Error: mount point {mnt_path} not found\n') sys.exit(1) fs_info = cast('struct btrfs_fs_info *', mnt.mnt.mnt_sb.s_fs_info) def dump_re(re): nzones = re.nzones.value_() print(f're at {hex(re.value_())}') print(f'\t logical {re.logical.value_()}') print(f'\t refcnt {re.refcnt.value_()}') print(f'\t nzones {nzones}') for i in range(nzones): dev = re.zones[i].device name = dev.name.str.string_() print(f'\t\t dev id {dev.devid.value_()} name {name}') print() for _, e in radix_tree_for_each(fs_info.reada_tree): re = cast('struct reada_extent *', e) dump_re(re) $ drgn dump_reada.py re at 0xffff8f3da9d25ad8 logical 38928384 refcnt 1 nzones 1 dev id 0 name b'/dev/sdd' $ So there was one readahead extent with a single zone corresponding to the source device of that last device replace operation logged in dmesg/syslog. Also the ID of that zone's device was 0 which is a special value set in the source device of a device replace operation when the operation finishes (constant BTRFS_DEV_REPLACE_DEVID set at btrfs_dev_replace_finishing()), confirming again that device /dev/sdd was the source of a device replace operation. Normally there should be as many zones in the readahead extent as there are devices, and I wasn't expecting the extent to be in a block group with a 'single' profile, so I went and confirmed with the following drgn script that there weren't any single profile block groups: $ cat dump_block_groups.py import sys import drgn from drgn import NULL, Object, cast, container_of, execscript, \ reinterpret, sizeof from drgn.helpers.linux import * mnt_path = b"/home/fdmanana/btrfs-tests/scratch_1" mnt = None for mnt in for_each_mount(prog, dst = mnt_path): pass if mnt is None: sys.stderr.write(f'Error: mount point {mnt_path} not found\n') sys.exit(1) fs_info = cast('struct btrfs_fs_info *', mnt.mnt.mnt_sb.s_fs_info) BTRFS_BLOCK_GROUP_DATA = (1 << 0) BTRFS_BLOCK_GROUP_SYSTEM = (1 << 1) BTRFS_BLOCK_GROUP_METADATA = (1 << 2) BTRFS_BLOCK_GROUP_RAID0 = (1 << 3) BTRFS_BLOCK_GROUP_RAID1 = (1 << 4) BTRFS_BLOCK_GROUP_DUP = (1 << 5) BTRFS_BLOCK_GROUP_RAID10 = (1 << 6) BTRFS_BLOCK_GROUP_RAID5 = (1 << 7) BTRFS_BLOCK_GROUP_RAID6 = (1 << 8) BTRFS_BLOCK_GROUP_RAID1C3 = (1 << 9) BTRFS_BLOCK_GROUP_RAID1C4 = (1 << 10) def bg_flags_string(bg): flags = bg.flags.value_() ret = '' if flags & BTRFS_BLOCK_GROUP_DATA: ret = 'data' if flags & BTRFS_BLOCK_GROUP_METADATA: if len(ret) > 0: ret += '|' ret += 'meta' if flags & BTRFS_BLOCK_GROUP_SYSTEM: if len(ret) > 0: ret += '|' ret += 'system' if flags & BTRFS_BLOCK_GROUP_RAID0: ret += ' raid0' elif flags & BTRFS_BLOCK_GROUP_RAID1: ret += ' raid1' elif flags & BTRFS_BLOCK_GROUP_DUP: ret += ' dup' elif flags & BTRFS_BLOCK_GROUP_RAID10: ret += ' raid10' elif flags & BTRFS_BLOCK_GROUP_RAID5: ret += ' raid5' elif flags & BTRFS_BLOCK_GROUP_RAID6: ret += ' raid6' elif flags & BTRFS_BLOCK_GROUP_RAID1C3: ret += ' raid1c3' elif flags & BTRFS_BLOCK_GROUP_RAID1C4: ret += ' raid1c4' else: ret += ' single' return ret def dump_bg(bg): print() print(f'block group at {hex(bg.value_())}') print(f'\t start {bg.start.value_()} length {bg.length.value_()}') print(f'\t flags {bg.flags.value_()} - {bg_flags_string(bg)}') bg_root = fs_info.block_group_cache_tree.address_of_() for bg in rbtree_inorder_for_each_entry('struct btrfs_block_group', bg_root, 'cache_node'): dump_bg(bg) $ drgn dump_block_groups.py block group at 0xffff8f3d673b0400 start 22020096 length 16777216 flags 258 - system raid6 block group at 0xffff8f3d53ddb400 start 38797312 length 536870912 flags 260 - meta raid6 block group at 0xffff8f3d5f4d9c00 start 575668224 length 2147483648 flags 257 - data raid6 block group at 0xffff8f3d08189000 start 2723151872 length 67108864 flags 258 - system raid6 block group at 0xffff8f3db70ff000 start 2790260736 length 1073741824 flags 260 - meta raid6 block group at 0xffff8f3d5f4dd800 start 3864002560 length 67108864 flags 258 - system raid6 block group at 0xffff8f3d67037000 start 3931111424 length 2147483648 flags 257 - data raid6 $ So there were only 2 reasons left for having a readahead extent with a single zone: reada_find_zone(), called when creating a readahead extent, returned NULL either because we failed to find the corresponding block group or because a memory allocation failed. With some additional and custom tracing I figured out that on every further ocurrence of the problem the block group had just been deleted when we were looping to create the zones for the readahead extent (at reada_find_extent()), so we ended up with only one zone in the readahead extent, corresponding to a device that ends up getting replaced. So after figuring that out it became obvious why the hang happens: 1) Task A starts a scrub on any device of the filesystem, except for device /dev/sdd; 2) Task B starts a device replace with /dev/sdd as the source device; 3) Task A calls btrfs_reada_add() from scrub_stripe() and it is currently starting to scrub a stripe from block group X. This call to btrfs_reada_add() is the one for the extent tree. When btrfs_reada_add() calls reada_add_block(), it passes the logical address of the extent tree's root node as its 'logical' argument - a value of 38928384; 4) Task A then enters reada_find_extent(), called from reada_add_block(). It finds there isn't any existing readahead extent for the logical address 38928384, so it proceeds to the path of creating a new one. It calls btrfs_map_block() to find out which stripes exist for the block group X. On the first iteration of the for loop that iterates over the stripes, it finds the stripe for device /dev/sdd, so it creates one zone for that device and adds it to the readahead extent. Before getting into the second iteration of the loop, the cleanup kthread deletes block group X because it was empty. So in the iterations for the remaining stripes it does not add more zones to the readahead extent, because the calls to reada_find_zone() returned NULL because they couldn't find block group X anymore. As a result the new readahead extent has a single zone, corresponding to the device /dev/sdd; 4) Before task A returns to btrfs_reada_add() and queues the readahead job for the readahead work queue, task B finishes the device replace and at btrfs_dev_replace_finishing() swaps the device /dev/sdd with the new device /dev/sdg; 5) Task A returns to reada_add_block(), which increments the counter "->elems" of the reada_control structure allocated at btrfs_reada_add(). Then it returns back to btrfs_reada_add() and calls reada_start_machine(). This queues a job in the readahead work queue to run the function reada_start_machine_worker(), which calls __reada_start_machine(). At __reada_start_machine() we take the device list mutex and for each device found in the current device list, we call reada_start_machine_dev() to start the readahead work. However at this point the device /dev/sdd was already freed and is not in the device list anymore. This means the corresponding readahead for the extent at 38928384 is never started, and therefore the "->elems" counter of the reada_control structure allocated at btrfs_reada_add() never goes down to 0, causing the call to btrfs_reada_wait(), done by the scrub task, to wait forever. Note that the readahead request can be made either after the device replace started or before it started, however in pratice it is very unlikely that a device replace is able to start after a readahead request is made and is able to complete before the readahead request completes - maybe only on a very small and nearly empty filesystem. This hang however is not the only problem we can have with readahead and device removals. When the readahead extent has other zones other than the one corresponding to the device that is being removed (either by a device replace or a device remove operation), we risk having a use-after-free on the device when dropping the last reference of the readahead extent. For example if we create a readahead extent with two zones, one for the device /dev/sdd and one for the device /dev/sde: 1) Before the readahead worker starts, the device /dev/sdd is removed, and the corresponding btrfs_device structure is freed. However the readahead extent still has the zone pointing to the device structure; 2) When the readahead worker starts, it only finds device /dev/sde in the current device list of the filesystem; 3) It starts the readahead work, at reada_start_machine_dev(), using the device /dev/sde; 4) Then when it finishes reading the extent from device /dev/sde, it calls __readahead_hook() which ends up dropping the last reference on the readahead extent through the last call to reada_extent_put(); 5) At reada_extent_put() it iterates over each zone of the readahead extent and attempts to delete an element from the device's 'reada_extents' radix tree, resulting in a use-after-free, as the device pointer of the zone for /dev/sdd is now stale. We can also access the device after dropping the last reference of a zone, through reada_zone_release(), also called by reada_extent_put(). And a device remove suffers the same problem, however since it shrinks the device size down to zero before removing the device, it is very unlikely to still have readahead requests not completed by the time we free the device, the only possibility is if the device has a very little space allocated. While the hang problem is exclusive to scrub, since it is currently the only user of btrfs_reada_add() and btrfs_reada_wait(), the use-after-free problem affects any path that triggers readhead, which includes btree_readahead_hook() and __readahead_hook() (a readahead worker can trigger readahed for the children of a node) for example - any path that ends up calling reada_add_block() can trigger the use-after-free after a device is removed. So fix this by waiting for any readahead requests for a device to complete before removing a device, ensuring that while waiting for existing ones no new ones can be made. This problem has been around for a very long time - the readahead code was added in 2011, device remove exists since 2008 and device replace was introduced in 2013, hard to pick a specific commit for a git Fixes tag. CC: [email protected] # 4.4+ Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

Currently 'while (q->queued > 0)' loop was removed from mt76u_stop_tx() code. This causes crash on device removal as we try to cleanup empty queue: [ 96.495571] kernel BUG at include/linux/skbuff.h:2297! [ 96.498983] invalid opcode: 0000 [#1] SMP PTI [ 96.501162] CPU: 3 PID: 27 Comm: kworker/3:0 Not tainted 5.10.0-rc5+ #11 [ 96.502754] Hardware name: LENOVO 20DGS08H00/20DGS08H00, BIOS J5ET48WW (1.19 ) 08/27/2015 [ 96.504378] Workqueue: usb_hub_wq hub_event [ 96.505983] RIP: 0010:skb_pull+0x2d/0x30 [ 96.507576] Code: 00 00 8b 47 70 39 c6 77 1e 29 f0 89 47 70 3b 47 74 72 17 48 8b 87 c8 00 00 00 89 f6 48 01 f0 48 89 87 c8 00 00 00 c3 31 c0 c3 <0f> 0b 90 0f 1f 44 00 00 53 48 89 fb 48 8b bf c8 00 00 00 8b 43 70 [ 96.509296] RSP: 0018:ffffb11b801639b8 EFLAGS: 00010287 [ 96.511038] RAX: 000000001c6939ed RBX: ffffb11b801639f8 RCX: 0000000000000000 [ 96.512964] RDX: ffffb11b801639f8 RSI: 0000000000000018 RDI: ffff90c64e4fb800 [ 96.514710] RBP: ffff90c654551ee0 R08: ffff90c652bce7a8 R09: ffffb11b80163728 [ 96.516450] R10: 0000000000000001 R11: 0000000000000001 R12: ffff90c64e4fb800 [ 96.519749] R13: 0000000000000010 R14: 0000000000000020 R15: ffff90c64e352ce8 [ 96.523455] FS: 0000000000000000(0000) GS:ffff90c96eec0000(0000) knlGS:0000000000000000 [ 96.527171] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 96.530900] CR2: 0000242556f18288 CR3: 0000000146a10002 CR4: 00000000003706e0 [ 96.534678] Call Trace: [ 96.538418] mt76x02u_tx_complete_skb+0x1f/0x50 [mt76x02_usb] [ 96.542231] mt76_queue_tx_complete+0x23/0x50 [mt76] [ 96.546028] mt76u_stop_tx.cold+0x71/0xa2 [mt76_usb] [ 96.549797] mt76x0u_stop+0x2f/0x90 [mt76x0u] [ 96.553638] drv_stop+0x33/0xd0 [mac80211] [ 96.557449] ieee80211_do_stop+0x558/0x860 [mac80211] [ 96.561262] ? dev_deactivate_many+0x298/0x2d0 [ 96.565101] ieee80211_stop+0x16/0x20 [mac80211] Fix that by adding while loop again. We need loop, not just single check, to clean all pending entries. Additionally move mt76_worker_disable/enable after !mt76_has_tx_pending() as we want to tx_worker to run to process tx queues, while we wait for exactly that. I was a bit worried about accessing q->queued without lock, but mt76_worker_disable() -> kthread_park() should assure this value will be seen updated on other cpus. Fixes: fe5b5ab ("mt76: unify queue tx cleanup code") Signed-off-by: Stanislaw Gruszka <[email protected]> Acked-by: Felix Fietkau <[email protected]> Signed-off-by: Kalle Valo <[email protected]> Link: https://lore.kernel.org/r/[email protected]

[ Upstream commit c5c97ca ] The ubsan reported the following error. It was because sample's raw data missed u32 padding at the end. So it broke the alignment of the array after it. The raw data contains an u32 size prefix so the data size should have an u32 padding after 8-byte aligned data. 27: Sample parsing :util/synthetic-events.c:1539:4: runtime error: store to misaligned address 0x62100006b9bc for type '__u64' (aka 'unsigned long long'), which requires 8 byte alignment 0x62100006b9bc: note: pointer points here 00 00 00 00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ #0 0x561532a9fc96 in perf_event__synthesize_sample util/synthetic-events.c:1539:13 Freescale#1 0x5615327f4a4f in do_test tests/sample-parsing.c:284:8 Freescale#2 0x5615327f3f50 in test__sample_parsing tests/sample-parsing.c:381:9 Freescale#3 0x56153279d3a1 in run_test tests/builtin-test.c:424:9 Freescale#4 0x56153279c836 in test_and_print tests/builtin-test.c:454:9 Freescale#5 0x56153279b7eb in __cmd_test tests/builtin-test.c:675:4 Freescale#6 0x56153279abf0 in cmd_test tests/builtin-test.c:821:9 Freescale#7 0x56153264e796 in run_builtin perf.c:312:11 Freescale#8 0x56153264cf03 in handle_internal_command perf.c:364:8 Freescale#9 0x56153264e47d in run_argv perf.c:408:2 Freescale#10 0x56153264c9a9 in main perf.c:538:3 Freescale#11 0x7f137ab6fbbc in __libc_start_main (/lib64/libc.so.6+0x38bbc) Freescale#12 0x561532596828 in _start ... SUMMARY: UndefinedBehaviorSanitizer: misaligned-pointer-use util/synthetic-events.c:1539:4 in Fixes: 045f8cd ("perf tests: Add a sample parsing test") Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

commit 4d14c5c upstream Calling btrfs_qgroup_reserve_meta_prealloc from btrfs_delayed_inode_reserve_metadata can result in flushing delalloc while holding a transaction and delayed node locks. This is deadlock prone. In the past multiple commits: * ae5e070 ("btrfs: qgroup: don't try to wait flushing if we're already holding a transaction") * 6f23277 ("btrfs: qgroup: don't commit transaction when we already hold the handle") Tried to solve various aspects of this but this was always a whack-a-mole game. Unfortunately those 2 fixes don't solve a deadlock scenario involving btrfs_delayed_node::mutex. Namely, one thread can call btrfs_dirty_inode as a result of reading a file and modifying its atime: PID: 6963 TASK: ffff8c7f3f94c000 CPU: 2 COMMAND: "test" #0 __schedule at ffffffffa529e07d Freescale#1 schedule at ffffffffa529e4ff Freescale#2 schedule_timeout at ffffffffa52a1bdd Freescale#3 wait_for_completion at ffffffffa529eeea <-- sleeps with delayed node mutex held Freescale#4 start_delalloc_inodes at ffffffffc0380db5 Freescale#5 btrfs_start_delalloc_snapshot at ffffffffc0393836 Freescale#6 try_flush_qgroup at ffffffffc03f04b2 Freescale#7 __btrfs_qgroup_reserve_meta at ffffffffc03f5bb6 <-- tries to reserve space and starts delalloc inodes. Freescale#8 btrfs_delayed_update_inode at ffffffffc03e31aa <-- acquires delayed node mutex Freescale#9 btrfs_update_inode at ffffffffc0385ba8 Freescale#10 btrfs_dirty_inode at ffffffffc038627b <-- TRANSACTIION OPENED Freescale#11 touch_atime at ffffffffa4cf0000 Freescale#12 generic_file_read_iter at ffffffffa4c1f123 Freescale#13 new_sync_read at ffffffffa4ccdc8a Freescale#14 vfs_read at ffffffffa4cd0849 Freescale#15 ksys_read at ffffffffa4cd0bd1 Freescale#16 do_syscall_64 at ffffffffa4a052eb Freescale#17 entry_SYSCALL_64_after_hwframe at ffffffffa540008c This will cause an asynchronous work to flush the delalloc inodes to happen which can try to acquire the same delayed_node mutex: PID: 455 TASK: ffff8c8085fa4000 CPU: 5 COMMAND: "kworker/u16:30" #0 __schedule at ffffffffa529e07d Freescale#1 schedule at ffffffffa529e4ff Freescale#2 schedule_preempt_disabled at ffffffffa529e80a Freescale#3 __mutex_lock at ffffffffa529fdcb <-- goes to sleep, never wakes up. Freescale#4 btrfs_delayed_update_inode at ffffffffc03e3143 <-- tries to acquire the mutex Freescale#5 btrfs_update_inode at ffffffffc0385ba8 <-- this is the same inode that pid 6963 is holding Freescale#6 cow_file_range_inline.constprop.78 at ffffffffc0386be7 Freescale#7 cow_file_range at ffffffffc03879c1 Freescale#8 btrfs_run_delalloc_range at ffffffffc038894c Freescale#9 writepage_delalloc at ffffffffc03a3c8f Freescale#10 __extent_writepage at ffffffffc03a4c01 Freescale#11 extent_write_cache_pages at ffffffffc03a500b Freescale#12 extent_writepages at ffffffffc03a6de2 Freescale#13 do_writepages at ffffffffa4c277eb Freescale#14 __filemap_fdatawrite_range at ffffffffa4c1e5bb Freescale#15 btrfs_run_delalloc_work at ffffffffc0380987 <-- starts running delayed nodes Freescale#16 normal_work_helper at ffffffffc03b706c Freescale#17 process_one_work at ffffffffa4aba4e4 Freescale#18 worker_thread at ffffffffa4aba6fd Freescale#19 kthread at ffffffffa4ac0a3d Freescale#20 ret_from_fork at ffffffffa54001ff To fully address those cases the complete fix is to never issue any flushing while holding the transaction or the delayed node lock. This patch achieves it by calling qgroup_reserve_meta directly which will either succeed without flushing or will fail and return -EDQUOT. In the latter case that return value is going to be propagated to btrfs_dirty_inode which will fallback to start a new transaction. That's fine as the majority of time we expect the inode will have BTRFS_DELAYED_NODE_INODE_DIRTY flag set which will result in directly copying the in-memory state. Fixes: c53e965 ("btrfs: qgroup: try to flush qgroup space when we get -EDQUOT") CC: [email protected] # 5.10+ Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: David Sterba <[email protected]> [sudip: adjust context] Signed-off-by: Sudip Mukherjee <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 90bd070 upstream. The following deadlock is detected: truncate -> setattr path is waiting for pending direct IO to be done (inode->i_dio_count become zero) with inode->i_rwsem held (down_write). PID: 14827 TASK: ffff881686a9af80 CPU: 20 COMMAND: "ora_p005_hrltd9" #0 __schedule at ffffffff818667cc Freescale#1 schedule at ffffffff81866de6 Freescale#2 inode_dio_wait at ffffffff812a2d04 Freescale#3 ocfs2_setattr at ffffffffc05f322e [ocfs2] Freescale#4 notify_change at ffffffff812a5a09 Freescale#5 do_truncate at ffffffff812808f5 Freescale#6 do_sys_ftruncate.constprop.18 at ffffffff81280cf2 Freescale#7 sys_ftruncate at ffffffff81280d8e Freescale#8 do_syscall_64 at ffffffff81003949 Freescale#9 entry_SYSCALL_64_after_hwframe at ffffffff81a001ad dio completion path is going to complete one direct IO (decrement inode->i_dio_count), but before that it hung at locking inode->i_rwsem: #0 __schedule+700 at ffffffff818667cc Freescale#1 schedule+54 at ffffffff81866de6 Freescale#2 rwsem_down_write_failed+536 at ffffffff8186aa28 Freescale#3 call_rwsem_down_write_failed+23 at ffffffff8185a1b7 Freescale#4 down_write+45 at ffffffff81869c9d Freescale#5 ocfs2_dio_end_io_write+180 at ffffffffc05d5444 [ocfs2] Freescale#6 ocfs2_dio_end_io+85 at ffffffffc05d5a85 [ocfs2] Freescale#7 dio_complete+140 at ffffffff812c873c Freescale#8 dio_aio_complete_work+25 at ffffffff812c89f9 Freescale#9 process_one_work+361 at ffffffff810b1889 Freescale#10 worker_thread+77 at ffffffff810b233d Freescale#11 kthread+261 at ffffffff810b7fd5 Freescale#12 ret_from_fork+62 at ffffffff81a0035e Thus above forms ABBA deadlock. The same deadlock was mentioned in upstream commit 28f5a8a ("ocfs2: should wait dio before inode lock in ocfs2_setattr()"). It seems that that commit only removed the cluster lock (the victim of above dead lock) from the ABBA deadlock party. End-user visible effects: Process hang in truncate -> ocfs2_setattr path and other processes hang at ocfs2_dio_end_io_write path. This is to fix the deadlock itself. It removes inode_lock() call from dio completion path to remove the deadlock and add ip_alloc_sem lock in setattr path to synchronize the inode modifications. [[email protected]: remove the "had_alloc_lock" as suggested] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Wengang Wang <[email protected]> Reviewed-by: Joseph Qi <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Gang He <[email protected]> Cc: Jun Piao <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

[ Upstream commit c5c97ca ] The ubsan reported the following error. It was because sample's raw data missed u32 padding at the end. So it broke the alignment of the array after it. The raw data contains an u32 size prefix so the data size should have an u32 padding after 8-byte aligned data. 27: Sample parsing :util/synthetic-events.c:1539:4: runtime error: store to misaligned address 0x62100006b9bc for type '__u64' (aka 'unsigned long long'), which requires 8 byte alignment 0x62100006b9bc: note: pointer points here 00 00 00 00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ #0 0x561532a9fc96 in perf_event__synthesize_sample util/synthetic-events.c:1539:13 Freescale#1 0x5615327f4a4f in do_test tests/sample-parsing.c:284:8 Freescale#2 0x5615327f3f50 in test__sample_parsing tests/sample-parsing.c:381:9 Freescale#3 0x56153279d3a1 in run_test tests/builtin-test.c:424:9 Freescale#4 0x56153279c836 in test_and_print tests/builtin-test.c:454:9 Freescale#5 0x56153279b7eb in __cmd_test tests/builtin-test.c:675:4 Freescale#6 0x56153279abf0 in cmd_test tests/builtin-test.c:821:9 Freescale#7 0x56153264e796 in run_builtin perf.c:312:11 Freescale#8 0x56153264cf03 in handle_internal_command perf.c:364:8 Freescale#9 0x56153264e47d in run_argv perf.c:408:2 Freescale#10 0x56153264c9a9 in main perf.c:538:3 Freescale#11 0x7f137ab6fbbc in __libc_start_main (/lib64/libc.so.6+0x38bbc) Freescale#12 0x561532596828 in _start ... SUMMARY: UndefinedBehaviorSanitizer: misaligned-pointer-use util/synthetic-events.c:1539:4 in Fixes: 045f8cd ("perf tests: Add a sample parsing test") Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

commit 90bd070 upstream. The following deadlock is detected: truncate -> setattr path is waiting for pending direct IO to be done (inode->i_dio_count become zero) with inode->i_rwsem held (down_write). PID: 14827 TASK: ffff881686a9af80 CPU: 20 COMMAND: "ora_p005_hrltd9" #0 __schedule at ffffffff818667cc Freescale#1 schedule at ffffffff81866de6 Freescale#2 inode_dio_wait at ffffffff812a2d04 Freescale#3 ocfs2_setattr at ffffffffc05f322e [ocfs2] Freescale#4 notify_change at ffffffff812a5a09 Freescale#5 do_truncate at ffffffff812808f5 Freescale#6 do_sys_ftruncate.constprop.18 at ffffffff81280cf2 Freescale#7 sys_ftruncate at ffffffff81280d8e Freescale#8 do_syscall_64 at ffffffff81003949 Freescale#9 entry_SYSCALL_64_after_hwframe at ffffffff81a001ad dio completion path is going to complete one direct IO (decrement inode->i_dio_count), but before that it hung at locking inode->i_rwsem: #0 __schedule+700 at ffffffff818667cc Freescale#1 schedule+54 at ffffffff81866de6 Freescale#2 rwsem_down_write_failed+536 at ffffffff8186aa28 Freescale#3 call_rwsem_down_write_failed+23 at ffffffff8185a1b7 Freescale#4 down_write+45 at ffffffff81869c9d Freescale#5 ocfs2_dio_end_io_write+180 at ffffffffc05d5444 [ocfs2] Freescale#6 ocfs2_dio_end_io+85 at ffffffffc05d5a85 [ocfs2] Freescale#7 dio_complete+140 at ffffffff812c873c Freescale#8 dio_aio_complete_work+25 at ffffffff812c89f9 Freescale#9 process_one_work+361 at ffffffff810b1889 Freescale#10 worker_thread+77 at ffffffff810b233d Freescale#11 kthread+261 at ffffffff810b7fd5 Freescale#12 ret_from_fork+62 at ffffffff81a0035e Thus above forms ABBA deadlock. The same deadlock was mentioned in upstream commit 28f5a8a ("ocfs2: should wait dio before inode lock in ocfs2_setattr()"). It seems that that commit only removed the cluster lock (the victim of above dead lock) from the ABBA deadlock party. End-user visible effects: Process hang in truncate -> ocfs2_setattr path and other processes hang at ocfs2_dio_end_io_write path. This is to fix the deadlock itself. It removes inode_lock() call from dio completion path to remove the deadlock and add ip_alloc_sem lock in setattr path to synchronize the inode modifications. [[email protected]: remove the "had_alloc_lock" as suggested] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Wengang Wang <[email protected]> Reviewed-by: Joseph Qi <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Gang He <[email protected]> Cc: Jun Piao <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 4d14c5c upstream Calling btrfs_qgroup_reserve_meta_prealloc from btrfs_delayed_inode_reserve_metadata can result in flushing delalloc while holding a transaction and delayed node locks. This is deadlock prone. In the past multiple commits: * ae5e070 ("btrfs: qgroup: don't try to wait flushing if we're already holding a transaction") * 6f23277 ("btrfs: qgroup: don't commit transaction when we already hold the handle") Tried to solve various aspects of this but this was always a whack-a-mole game. Unfortunately those 2 fixes don't solve a deadlock scenario involving btrfs_delayed_node::mutex. Namely, one thread can call btrfs_dirty_inode as a result of reading a file and modifying its atime: PID: 6963 TASK: ffff8c7f3f94c000 CPU: 2 COMMAND: "test" #0 __schedule at ffffffffa529e07d Freescale#1 schedule at ffffffffa529e4ff Freescale#2 schedule_timeout at ffffffffa52a1bdd Freescale#3 wait_for_completion at ffffffffa529eeea <-- sleeps with delayed node mutex held Freescale#4 start_delalloc_inodes at ffffffffc0380db5 Freescale#5 btrfs_start_delalloc_snapshot at ffffffffc0393836 Freescale#6 try_flush_qgroup at ffffffffc03f04b2 Freescale#7 __btrfs_qgroup_reserve_meta at ffffffffc03f5bb6 <-- tries to reserve space and starts delalloc inodes. Freescale#8 btrfs_delayed_update_inode at ffffffffc03e31aa <-- acquires delayed node mutex Freescale#9 btrfs_update_inode at ffffffffc0385ba8 Freescale#10 btrfs_dirty_inode at ffffffffc038627b <-- TRANSACTIION OPENED Freescale#11 touch_atime at ffffffffa4cf0000 Freescale#12 generic_file_read_iter at ffffffffa4c1f123 Freescale#13 new_sync_read at ffffffffa4ccdc8a Freescale#14 vfs_read at ffffffffa4cd0849 Freescale#15 ksys_read at ffffffffa4cd0bd1 Freescale#16 do_syscall_64 at ffffffffa4a052eb Freescale#17 entry_SYSCALL_64_after_hwframe at ffffffffa540008c This will cause an asynchronous work to flush the delalloc inodes to happen which can try to acquire the same delayed_node mutex: PID: 455 TASK: ffff8c8085fa4000 CPU: 5 COMMAND: "kworker/u16:30" #0 __schedule at ffffffffa529e07d Freescale#1 schedule at ffffffffa529e4ff Freescale#2 schedule_preempt_disabled at ffffffffa529e80a Freescale#3 __mutex_lock at ffffffffa529fdcb <-- goes to sleep, never wakes up. Freescale#4 btrfs_delayed_update_inode at ffffffffc03e3143 <-- tries to acquire the mutex Freescale#5 btrfs_update_inode at ffffffffc0385ba8 <-- this is the same inode that pid 6963 is holding Freescale#6 cow_file_range_inline.constprop.78 at ffffffffc0386be7 Freescale#7 cow_file_range at ffffffffc03879c1 Freescale#8 btrfs_run_delalloc_range at ffffffffc038894c Freescale#9 writepage_delalloc at ffffffffc03a3c8f Freescale#10 __extent_writepage at ffffffffc03a4c01 Freescale#11 extent_write_cache_pages at ffffffffc03a500b Freescale#12 extent_writepages at ffffffffc03a6de2 Freescale#13 do_writepages at ffffffffa4c277eb Freescale#14 __filemap_fdatawrite_range at ffffffffa4c1e5bb Freescale#15 btrfs_run_delalloc_work at ffffffffc0380987 <-- starts running delayed nodes Freescale#16 normal_work_helper at ffffffffc03b706c Freescale#17 process_one_work at ffffffffa4aba4e4 Freescale#18 worker_thread at ffffffffa4aba6fd Freescale#19 kthread at ffffffffa4ac0a3d Freescale#20 ret_from_fork at ffffffffa54001ff To fully address those cases the complete fix is to never issue any flushing while holding the transaction or the delayed node lock. This patch achieves it by calling qgroup_reserve_meta directly which will either succeed without flushing or will fail and return -EDQUOT. In the latter case that return value is going to be propagated to btrfs_dirty_inode which will fallback to start a new transaction. That's fine as the majority of time we expect the inode will have BTRFS_DELAYED_NODE_INODE_DIRTY flag set which will result in directly copying the in-memory state. Fixes: c53e965 ("btrfs: qgroup: try to flush qgroup space when we get -EDQUOT") CC: [email protected] # 5.10+ Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: David Sterba <[email protected]> Signed-off-by: Anand Jain <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

kedi1982 and others added 30 commits July 27, 2017 15:08

staging: rtl8188eu: add TL-WN722N v2 support

964a21a

commit 5a1d4c5 upstream. Add support for USB Device TP-Link TL-WN722N v2. VendorID: 0x2357, ProductID: 0x010c Signed-off-by: Michael Gugino <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

Linux 4.9.40

efcfbfb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge latest 4.9.y stable patches (4.9.43)#11

Merge latest 4.9.y stable patches (4.9.43)#11
gibsson wants to merge 1591 commits intoFreescale:4.9.x+fslcfrom
gibsson:4.9.x+fslc

gibsson commented Aug 15, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

gibsson commented Aug 15, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants