Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync with linus #4

Merged
merged 10,000 commits into from
Jun 14, 2018
Merged

sync with linus #4

merged 10,000 commits into from
Jun 14, 2018

Conversation

neilbrown
Copy link
Owner

No description provided.

raven-au and others added 30 commits June 7, 2018 17:34
Update Kconfig and add a depricated warning.

[[email protected]: make autofs4 Kconfig depend on AUTOFS_FS]
  Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ian Kent <[email protected]>
Tested-by: Randy Dunlap <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Update Makefile to build from source in fs/autofs instead of fs/autofs4.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ian Kent <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Delete the now unused autofs4 module files.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ian Kent <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Create Makefile and Kconfig for autofs module.

[[email protected]: make autofs4 Kconfig depend on AUTOFS_FS]
  Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ian Kent <[email protected]>
Tested-by: Randy Dunlap <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
There are two files in Documentation/filsystems that should now use
autofs rather than autofs4 in their names.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ian Kent <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Finally remove autofs4 references in the filesystems documentation.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ian Kent <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Update the autofs entry in MAINTAINERS to reflect the rename of autofs4
to autofs.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ian Kent <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Due to the autofs4 module using a file system type name of autofs
different from the module containing directory name autoload did not
function properly.  To work around this kernel configurations have often
elected to build the module into the kernel.

This can result in selinux policies that prohibit autoloading of the
autofs module which need to be changed.

Add a comment about this to "possible changes" section of the autofs4
module help.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ian Kent <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Remove includes that aren't needed from autofs (and fs/compat_ioctl.c).

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ian Kent <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
We don't set "*name" so it's slightly nicer to just pass "name" instead
of "&name".

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Dan Carpenter <[email protected]>
Acked-by: "Eric W. Biederman" <[email protected]>
Acked-by: Ian Kent <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Merge updates from Andrew Morton:

 - a few misc things

 - ocfs2 updates

 - v9fs updates

 - MM

 - procfs updates

 - lib/ updates

 - autofs updates

* emailed patches from Andrew Morton <[email protected]>: (118 commits)
  autofs: small cleanup in autofs_getpath()
  autofs: clean up includes
  autofs: comment on selinux changes needed for module autoload
  autofs: update MAINTAINERS entry for autofs
  autofs: use autofs instead of autofs4 in documentation
  autofs: rename autofs documentation files
  autofs: create autofs Kconfig and Makefile
  autofs: delete fs/autofs4 source files
  autofs: update fs/autofs4/Makefile
  autofs: update fs/autofs4/Kconfig
  autofs: copy autofs4 to autofs
  autofs4: use autofs instead of autofs4 everywhere
  autofs4: merge auto_fs.h and auto_fs4.h
  fs/binfmt_misc.c: do not allow offset overflow
  checkpatch: improve patch recognition
  lib/ucs2_string.c: add MODULE_LICENSE()
  lib/mpi: headers cleanup
  lib/percpu_ida.c: use _irqsave() instead of local_irq_save() + spin_lock
  lib/idr.c: remove simple_ida_lock
  lib/bitmap.c: micro-optimization for __bitmap_complement()
  ...
If AHD_DEBUG is enabled and ahd_platform_alloc fails then ahd is set to
null and the debug printk dereferences ahd when passing it to ahd_name.
Fix this by moving the debug printk to before the call to
ahd_platform_alloc where ahd is not null at that point.

Detected by CoverityScan, CID#100296 ("Explicit null dereference")

Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
This patch fixes a crash on qla2x00_mailbox_command caused when the driver
is on UNLOADING state and tries to call qla2x00_poll, which triggers a
NULL pointer dereference.

Signed-off-by: Rodrigo R. Galvao <[email protected]>
Signed-off-by: Mauro S. M. Rodrigues <[email protected]>
Acked-by: Himanshu Madhani <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
Some newer target uses "Status Qualifier" response in a returned "Busy
Status". This new response code of 0x4001, which is "Scope" bits,
translates to "Affects all units accessible by target".  Due to this new
value returned in the Scope bits, driver was using that value as timeout
value which resulted into driver waiting for 27min timeout.

This patch masks off this Scope bits so that driver does not use this
value as retry delay time.

Cc: <[email protected]>
Signed-off-by: Anil Gurumurthy <[email protected]>
Signed-off-by: Giridhar Malavali <[email protected]>
Signed-off-by: Himanshu Madhani <[email protected]>
Reviewed-by: Ewan D. Milne <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
This leak was introduced in 91cb74f and caused us
to leak one small buffer for every symlink query.

Signed-off-by: Ronnie Sahlberg <[email protected]>
Signed-off-by: Steve French <[email protected]>
If d_alloc_parallel() returns ERR_PTR(...), we don't want to dput()
that.  Small reorganization allows to have all error-in-lookup
cases rejoin the main codepath after dput(child), avoiding the
entire problem.

Spotted-by: Tetsuo Handa <[email protected]>
Fixes: 0168b9e "procfs: switch instantiate_t to d_splice_alias()"
Signed-off-by: Al Viro <[email protected]>
…op.org/drm/drm-misc into drm-next

drm-misc-next-fixes for v4.18:

Driver changes:
- Plug small memory leak in vc4. (anholt)
- Depend on MMU in v3d. (arnd)

Signed-off-by: Dave Airlie <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
hid-alps driver cleanups wrt. t4_read_write_register() handling
from Christophe Jaillet
Redragon Asura support from Robert Munteanu
Valve Steam Controller support from Rodrigo Rivas Costa
Assorted smaller fixes to i2c-hid driver
- improvement of duplicate usage handling in hid-input from Benjamin Tissoires
- Win 8.1 precisioun touchpad spec implementation from Benjamin Tissoires
RMI4 correct split report handling from Benjamin Tissoires
Support for "In Range" flag for Wacom Intuos/Bamboo devices from Jason Gerecke
Enable HWP boost on Skylake server and workstations.

Reported-by: Mel Gorman <[email protected]>
Tested-by: Giovanni Gherdovich <[email protected]>
Signed-off-by: Srinivas Pandruvada <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
According to current code implementation, detecting the long
idle period is done by checking if the interval between two
adjacent utilization update handlers is long enough. Although
this mechanism can detect if the idle period is long enough
(no utilization hooks invoked during idle period), it might
not cover a corner case: if the task has occupied the CPU
for too long which causes no context switches during that
period, then no utilization handler will be launched until this
high prio task is scheduled out. As a result, the idle_periods
field might be calculated incorrectly because it regards the
100% load as 0% and makes the conservative governor who uses
this field confusing.

Change the detection to compare the idle_time with sampling_rate
directly.

Reported-by: Artem S. Tashkinov <[email protected]>
Signed-off-by: Chen Yu <[email protected]>
Acked-by: Viresh Kumar <[email protected]>
Cc: All applicable <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
Commit 21c54b7 ("kconfig: show compiler version text in the top
comment") was intended to detect the compiler upgrade, but Geert
reported a breakage on the m68k build.

The compiler upgrade is detected by the change of the environment
variable, CC_VERSION_TEXT, which contains the first line of the output
from $(CC) --version.  Currently, this works well when CROSS_COMPILE
is given via the environment variable or the Make command line.

However, some architectures such as m68k can specify CROSS_COMPILE
from arch/$(SRCARCH)/Makefile as well.  In this case, "make ARCH=m68k"
ends up with endless syncconfig loop.

  $ make ARCH=m68k defconfig
  *** Default configuration is based on 'multi_defconfig'
  #
  # configuration written to .config
  #
  $ make ARCH=m68k
  scripts/kconfig/conf  --syncconfig Kconfig
  scripts/kconfig/conf  --syncconfig Kconfig
  scripts/kconfig/conf  --syncconfig Kconfig
  scripts/kconfig/conf  --syncconfig Kconfig

Things are happening like this:

Because arch/$(SRCARCH)/Makefile is included after CC_VERSION_TEXT
is set, it contains the host compiler version in the defconfig phase.

To create or update auto.conf, the following line is triggered:

include/config/%.conf: $(KCONFIG_CONFIG) include/config/auto.conf.cmd
        $(Q)$(MAKE) -f $(srctree)/Makefile syncconfig

This recurses the top Makefile after arch/$(SRCARCH)/Makefile is
included.  CROSS_COMPILE is set to a m68k toolchain prefix and
exported to the recursed Make.  Then, syncconfig is invoked with
the target compiler version in CC_VERSION_TEXT.

The Make will restart because auto.conf and auto.conf.cmd have been
updated.  At this point, CROSS_COMPILE is reset, so CC_VERSION_TEXT
is set to the host compiler version again.  Then, syncconfig is
triggered due to the change of CC_VERSION_TEXT.  This loop continues
eternally.

To fix this problem, $(CC_VERSION_TEXT) must be evaluated only after
arch/$(SRCARCH)/Makefile.  Setting it earlier is OK as long as it is
defined by using the '=' operator instead of ':='.

For the defconfig phase, $(CC_VERSION_TEXT) is evaluated when Kbuild
descends into scripts/kconfig/, so it contains the target compiler
version correctly.

include/config/auto.conf.cmd references $(CC_VERSION_TEXT) as well,
so it must be included after arch/$(SRCARCH)/Makefile.

Fixes: 21c54b7 ("kconfig: show compiler version text in the top comment")
Reported-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>
Tested-by: Geert Uytterhoeven <[email protected]>
neilbrown pushed a commit that referenced this pull request Jan 1, 2020
[ Upstream commit f80c5da ]

This commit makes the kernel not send the next queued HCI command until
a command complete arrives for the last HCI command sent to the
controller. This change avoids a problem with some buggy controllers
(seen on two SKUs of QCA9377) that send an extra command complete event
for the previous command after the kernel had already sent a new HCI
command to the controller.

The problem was reproduced when starting an active scanning procedure,
where an extra command complete event arrives for the LE_SET_RANDOM_ADDR
command. When this happends the kernel ends up not processing the
command complete for the following commmand, LE_SET_SCAN_PARAM, and
ultimately behaving as if a passive scanning procedure was being
performed, when in fact controller is performing an active scanning
procedure. This makes it impossible to discover BLE devices as no device
found events are sent to userspace.

This problem is reproducible on 100% of the attempts on the affected
controllers. The extra command complete event can be seen at timestamp
27.420131 on the btmon logs bellow.

Bluetooth monitor ver 5.50
= Note: Linux version 5.0.0+ (x86_64)                                  0.352340
= Note: Bluetooth subsystem version 2.22                               0.352343
= New Index: 80:C5:F2:8F:87:84 (Primary,USB,hci0)               [hci0] 0.352344
= Open Index: 80:C5:F2:8F:87:84                                 [hci0] 0.352345
= Index Info: 80:C5:F2:8F:87:84 (Qualcomm)                      [hci0] 0.352346
@ MGMT Open: bluetoothd (privileged) version 1.14             {0x0001} 0.352347
@ MGMT Open: btmon (privileged) version 1.14                  {0x0002} 0.352366
@ MGMT Open: btmgmt (privileged) version 1.14                {0x0003} 27.302164
@ MGMT Command: Start Discovery (0x0023) plen 1       {0x0003} [hci0] 27.302310
        Address type: 0x06
          LE Public
          LE Random
< HCI Command: LE Set Random Address (0x08|0x0005) plen 6   #1 [hci0] 27.302496
        Address: 15:60:F2:91:B2:24 (Non-Resolvable)
> HCI Event: Command Complete (0x0e) plen 4                 #2 [hci0] 27.419117
      LE Set Random Address (0x08|0x0005) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Set Scan Parameters (0x08|0x000b) plen 7  #3 [hci0] 27.419244
        Type: Active (0x01)
        Interval: 11.250 msec (0x0012)
        Window: 11.250 msec (0x0012)
        Own address type: Random (0x01)
        Filter policy: Accept all advertisement (0x00)
> HCI Event: Command Complete (0x0e) plen 4                 #4 [hci0] 27.420131
      LE Set Random Address (0x08|0x0005) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Set Scan Enable (0x08|0x000c) plen 2      #5 [hci0] 27.420259
        Scanning: Enabled (0x01)
        Filter duplicates: Enabled (0x01)
> HCI Event: Command Complete (0x0e) plen 4                 torvalds#6 [hci0] 27.420969
      LE Set Scan Parameters (0x08|0x000b) ncmd 1
        Status: Success (0x00)
> HCI Event: Command Complete (0x0e) plen 4                 torvalds#7 [hci0] 27.421983
      LE Set Scan Enable (0x08|0x000c) ncmd 1
        Status: Success (0x00)
@ MGMT Event: Command Complete (0x0001) plen 4        {0x0003} [hci0] 27.422059
      Start Discovery (0x0023) plen 1
        Status: Success (0x00)
        Address type: 0x06
          LE Public
          LE Random
@ MGMT Event: Discovering (0x0013) plen 2             {0x0003} [hci0] 27.422067
        Address type: 0x06
          LE Public
          LE Random
        Discovery: Enabled (0x01)
@ MGMT Event: Discovering (0x0013) plen 2             {0x0002} [hci0] 27.422067
        Address type: 0x06
          LE Public
          LE Random
        Discovery: Enabled (0x01)
@ MGMT Event: Discovering (0x0013) plen 2             {0x0001} [hci0] 27.422067
        Address type: 0x06
          LE Public
          LE Random
        Discovery: Enabled (0x01)

Signed-off-by: João Paulo Rechi Vita <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
neilbrown pushed a commit that referenced this pull request Jan 1, 2020
[ Upstream commit 41a91c6 ]

dwc3_gadget_suspend() is called under dwc->lock spinlock. In such context
calling synchronize_irq() is not allowed. Move the problematic call out
of the protected block to fix the following kernel BUG during system
suspend:

BUG: sleeping function called from invalid context at kernel/irq/manage.c:112
in_atomic(): 1, irqs_disabled(): 128, pid: 1601, name: rtcwake
6 locks held by rtcwake/1601:
 #0: f70ac2a2 (sb_writers#7){.+.+}, at: vfs_write+0x130/0x16c
 #1: b5fe1270 (&of->mutex){+.+.}, at: kernfs_fop_write+0xc0/0x1e4
 #2: 7e597705 (kn->count#60){.+.+}, at: kernfs_fop_write+0xc8/0x1e4
 #3: 8b3527d0 (system_transition_mutex){+.+.}, at: pm_suspend+0xc4/0xc04
 #4: fc7f1c42 (&dev->mutex){....}, at: __device_suspend+0xd8/0x74c
 #5: 4b36507e (&(&dwc->lock)->rlock){....}, at: dwc3_gadget_suspend+0x24/0x3c
irq event stamp: 11252
hardirqs last  enabled at (11251): [<c09c54a4>] _raw_spin_unlock_irqrestore+0x6c/0x74
hardirqs last disabled at (11252): [<c09c4d44>] _raw_spin_lock_irqsave+0x1c/0x5c
softirqs last  enabled at (9744): [<c0102564>] __do_softirq+0x3a4/0x66c
softirqs last disabled at (9737): [<c0128528>] irq_exit+0x140/0x168
Preemption disabled at:
[<00000000>]   (null)
CPU: 7 PID: 1601 Comm: rtcwake Not tainted
5.0.0-rc3-next-20190122-00039-ga3f4ee4f8a52 #5252
Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[<c01110f0>] (unwind_backtrace) from [<c010d120>] (show_stack+0x10/0x14)
[<c010d120>] (show_stack) from [<c09a4d04>] (dump_stack+0x90/0xc8)
[<c09a4d04>] (dump_stack) from [<c014c700>] (___might_sleep+0x22c/0x2c8)
[<c014c700>] (___might_sleep) from [<c0189d68>] (synchronize_irq+0x28/0x84)
[<c0189d68>] (synchronize_irq) from [<c05cbbf8>] (dwc3_gadget_suspend+0x34/0x3c)
[<c05cbbf8>] (dwc3_gadget_suspend) from [<c05bd020>] (dwc3_suspend_common+0x154/0x410)
[<c05bd020>] (dwc3_suspend_common) from [<c05bd34c>] (dwc3_suspend+0x14/0x2c)
[<c05bd34c>] (dwc3_suspend) from [<c051c730>] (platform_pm_suspend+0x2c/0x54)
[<c051c730>] (platform_pm_suspend) from [<c05285d4>] (dpm_run_callback+0xa4/0x3dc)
[<c05285d4>] (dpm_run_callback) from [<c0528a40>] (__device_suspend+0x134/0x74c)
[<c0528a40>] (__device_suspend) from [<c052c508>] (dpm_suspend+0x174/0x588)
[<c052c508>] (dpm_suspend) from [<c0182134>] (suspend_devices_and_enter+0xc0/0xe74)
[<c0182134>] (suspend_devices_and_enter) from [<c0183658>] (pm_suspend+0x770/0xc04)
[<c0183658>] (pm_suspend) from [<c0180ddc>] (state_store+0x6c/0xcc)
[<c0180ddc>] (state_store) from [<c09a9a70>] (kobj_attr_store+0x14/0x20)
[<c09a9a70>] (kobj_attr_store) from [<c02d6800>] (sysfs_kf_write+0x4c/0x50)
[<c02d6800>] (sysfs_kf_write) from [<c02d594c>] (kernfs_fop_write+0xfc/0x1e4)
[<c02d594c>] (kernfs_fop_write) from [<c02593d8>] (__vfs_write+0x2c/0x160)
[<c02593d8>] (__vfs_write) from [<c0259694>] (vfs_write+0xa4/0x16c)
[<c0259694>] (vfs_write) from [<c0259870>] (ksys_write+0x40/0x8c)
[<c0259870>] (ksys_write) from [<c0101000>] (ret_fast_syscall+0x0/0x28)
Exception stack(0xed55ffa8 to 0xed55fff0)
...

Fixes: 01c1088 ("usb: dwc3: gadget: synchronize_irq dwc irq in suspend")
Signed-off-by: Marek Szyprowski <[email protected]>
Signed-off-by: Felipe Balbi <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
neilbrown pushed a commit that referenced this pull request Jan 1, 2020
[ Upstream commit ff612ba ]

We've been seeing the following sporadically throughout our fleet

panic: kernel BUG at fs/btrfs/relocation.c:4584!
netversion: 5.0-0
Backtrace:
 #0 [ffffc90003adb880] machine_kexec at ffffffff81041da8
 #1 [ffffc90003adb8c8] __crash_kexec at ffffffff8110396c
 #2 [ffffc90003adb988] crash_kexec at ffffffff811048ad
 #3 [ffffc90003adb9a0] oops_end at ffffffff8101c19a
 #4 [ffffc90003adb9c0] do_trap at ffffffff81019114
 #5 [ffffc90003adba00] do_error_trap at ffffffff810195d0
 torvalds#6 [ffffc90003adbab0] invalid_op at ffffffff81a00a9b
    [exception RIP: btrfs_reloc_cow_block+692]
    RIP: ffffffff8143b614  RSP: ffffc90003adbb68  RFLAGS: 00010246
    RAX: fffffffffffffff7  RBX: ffff8806b9c32000  RCX: ffff8806aad00690
    RDX: ffff880850b295e0  RSI: ffff8806b9c32000  RDI: ffff88084f205bd0
    RBP: ffff880849415000   R8: ffffc90003adbbe0   R9: ffff88085ac90000
    R10: ffff8805f7369140  R11: 0000000000000000  R12: ffff880850b295e0
    R13: ffff88084f205bd0  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 torvalds#7 [ffffc90003adbbb0] __btrfs_cow_block at ffffffff813bf1cd
 torvalds#8 [ffffc90003adbc28] btrfs_cow_block at ffffffff813bf4b3
 torvalds#9 [ffffc90003adbc78] btrfs_search_slot at ffffffff813c2e6c

The way relocation moves data extents is by creating a reloc inode and
preallocating extents in this inode and then copying the data into these
preallocated extents.  Once we've done this for all of our extents,
we'll write out these dirty pages, which marks the extent written, and
goes into btrfs_reloc_cow_block().  From here we get our current
reloc_control, which _should_ match the reloc_control for the current
block group we're relocating.

However if we get an ENOSPC in this path at some point we'll bail out,
never initiating writeback on this inode.  Not a huge deal, unless we
happen to be doing relocation on a different block group, and this block
group is now rc->stage == UPDATE_DATA_PTRS.  This trips the BUG_ON() in
btrfs_reloc_cow_block(), because we expect to be done modifying the data
inode.  We are in fact done modifying the metadata for the data inode
we're currently using, but not the one from the failed block group, and
thus we BUG_ON().

(This happens when writeback finishes for extents from the previous
group, when we are at btrfs_finish_ordered_io() which updates the data
reloc tree (inode item, drops/adds extent items, etc).)

Fix this by writing out the reloc data inode always, and then breaking
out of the loop after that point to keep from tripping this BUG_ON()
later.

Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: Filipe Manana <[email protected]>
[ add note from Filipe ]
Signed-off-by: David Sterba <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
neilbrown pushed a commit that referenced this pull request Jan 1, 2020
[ Upstream commit 7603df3 ]

[    3.707412] Unable to handle kernel NULL pointer dereference at virtual address 0000009c
[    3.714511] pgd = (ptrval)
[    3.722742] [0000009c] *pgd=00000000
[    3.725238] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[    3.728968] Modules linked in:
[    3.734265] CPU: 3 PID: 112 Comm: kworker/3:2 Tainted: G        W         5.0.0-rc7-00183-g06a1c31df9eb #4
[    3.737142] Hardware name: Generic DT based system
[    3.746778] Workqueue: events deferred_probe_work_func
[    3.751542] PC is at msm_gem_map_vma+0x3c/0xac
[    3.756669] LR is at msm_gem_get_and_pin_iova+0xd8/0x134
[    3.761086] pc : [<c07d3b7c>]    lr : [<c07d14f8>]    psr: 60000013
[    3.766560] sp : ee297be8  ip : ed9ab1c0  fp : ed93b800
[    3.772546] r10: ee35e180  r9 : 00000000  r8 : ee297c80
[    3.777752] r7 : 00000000  r6 : 7c100000  r5 : 00000000  r4 : ee35e180
[    3.782968] r3 : 00000001  r2 : 00000003  r1 : ee35e180  r0 : 00000000
[    3.789562] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[    3.796079] Control: 10c5787d  Table: 2e3a806  DAC: 00000051
[    3.803282] Process kworker/3:2 (pid: 112, stack limit = 0x(ptrval))
[    3.809006] Stack: (0xee297be8 to 0xee298000)
[    3.815445] 7be0:                   00000000 c1108c48 eda8c000 00000003 eda8c0fc c1108c48
[    3.819715] 7c00: eda8c000 00000003 eda8c0fc c07d14f8 00000001 c07d1100 7c100000 00000000
[    3.827873] 7c20: eda8c000 bb7ffb78 00000000 eda8c000 00000000 00000000 c0c8b1d4 ee3bfa00
[    3.836037] 7c40: ee3b9800 c07d1684 00000000 c1108c48 ee0d7810 ee3b9800 c0c8b1d4 c07d222c
[    3.844193] 7c60: ee3bfd84 ee297c80 00000000 c0b1d5b0 ee3bfc40 c07dcfd8 ee3bfd84 ee297c80
[    3.852357] 7c80: 0000006d ee3bfc40 ee0d7810 bb7ffb78 c0c8b1d4 00000000 ee3bfc40 c07ddb48
[    3.860516] 7ca0: 00002004 c0eba384 ee3bfc40 c079eba0 ee3bd040 ee3b9800 00000001 ed93b800
[    3.868673] 7cc0: ed9aa100 c07db7e8 ee3bf240 ed9a6500 00000001 ee3b9800 ee3bf2d4 c07a0a30
[    3.876834] 7ce0: ed93b800 7d100000 c1108c48 ee0d7610 ee3b9800 ed93b800 c1108c48 00000000
[    3.884991] 7d00: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    3.893151] 7d20: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 bb7ffb78
[    3.901310] 7d40: c12113c4 ed93b800 ee3b9800 c1108c48 ee9eec10 00000000 ed93b800 7d100000
[    3.909472] 7d60: eff7b000 c07cf748 7d100000 00000000 c0e9a350 c0b1d5b0 c12113c4 c0961e40
[    3.917633] 7d80: c12113c4 40000113 eeff4bec c0ebe004 00000019 c0b1d230 ee9eeda8 60000113
[    3.925791] 7da0: ee35d300 ee9eeda8 c07ce260 bb7ffb78 c07ce260 ee35d2c 00000028 00000002
[    3.933950] 7dc0: eeb76280 c118f884 ee0be640 c11c6128 c07ce260 c07ea4ac 00000000 c0962b48
[    3.942108] 7de0: c118f868 00000001 c0ebbc98 ee35d2c 00000000 eeb76280 00000000 c118f87c
[    3.950270] 7e00: ee35d2c 00000000 c11c63e0 c118f694 00000019 c07ea5d0 ee0d7810 00000000
[    3.958430] 7e20: c118f694 00000000 00000000 c07f2b0c c120f55c ee0d7810 c120f560 00000000
[    3.966590] 7e40: 00000000 c07f08c4 c07f0e8c ee0d7810 c11ba3d0 ee0d7810 c118f694 c07f0e8c
[    3.974748] 7e60: c1108c48 00000001 c0ebc3cc c11c63f8 c11ba3d0 c07f0c08 00000001 c07f2f8c
[    3.982908] 7e80: c118f694 00000000 ee297ed4 c07f0e8c c1108c48 00000001 c0ebc3cc c11c63f8
[    3.991068] 7ea0: c11ba3d0 c07ee8a0 c11ba3d0 ee82686c ee0baf38 bb7ffb78 ee0d7810 ee0d7810
[    3.999227] 7ec0: c1108c48 ee0d7844 c118faac c07f05b0 ee0d7810 ee0d7810 00000001 bb7ffb78
[    4.007389] 7ee0: ee0d7810 ee0d7810 c118fd18 c118faac c11c63e0 c07ef7d0 ee0d7810 c118fa90
[    4.015548] 7f00: c118fa90 c07efd68 c118fac8 ee27fe00 eefd9c80 eefdcd00 00000000 c118facc
[    4.023708] 7f20: 00000000 c033c038 eefd9c80 eefd9c80 00000008 ee27fe00 ee27fe14 eefd9c80
[    4.031866] 7f40: 00000008 c1103d00 eefd9c98 ee296000 eefd9c80 c033ce54 ee907eac c0b1d230
[    4.040026] 7f60: ee907eac eea24440 ee285000 00000000 ee296000 ee27fe00 c033ce24 eea2445c
[    4.048188] 7f80: ee907eac c0341db0 00000000 ee285000 c0341c8c 00000000 00000000 00000000
[    4.056346] 7fa0: 00000000 00000000 00000000 c03010e8 00000000 00000000 00000000 00000000
[    4.064505] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    4.072665] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[    4.080828] [<c07d3b7c>] (msm_gem_map_vma) from [<c07d14f8>] (msm_gem_get_and_pin_iova+0xd8/0x134)
[    4.088983] [<c07d14f8>] (msm_gem_get_and_pin_iova) from [<c07d1684>] (_msm_gem_kernel_new+0x38/0xac)
[    4.097839] [<c07d1684>] (_msm_gem_kernel_new) from [<c07d222c>] (msm_gem_kernel_new+0x24/0x2c)
[    4.107130] [<c07d222c>] (msm_gem_kernel_new) from [<c07dcfd8>] (dsi_tx_buf_alloc_6g+0x44/0x90)
[    4.115631] [<c07dcfd8>] (dsi_tx_buf_alloc_6g) from [<c07ddb48>] (msm_dsi_host_modeset_init+0x80/0x104)
[    4.124313] [<c07ddb48>] (msm_dsi_host_modeset_init) from [<c07db7e8>] (msm_dsi_modeset_init+0x34/0x1c0)
[    4.133691] [<c07db7e8>] (msm_dsi_modeset_init) from [<c07a0a30>] (mdp5_kms_init+0x764/0x7e0)
[    4.143409] [<c07a0a30>] (mdp5_kms_init) from [<c07cf748>] (msm_drm_bind+0x56c/0x740)
[    4.151824] [<c07cf748>] (msm_drm_bind) from [<c07ea4ac>] (try_to_bring_up_master+0x238/0x2b4)
[    4.159636] [<c07ea4ac>] (try_to_bring_up_master) from [<c07ea5d0>] (component_add+0xa8/0x170)
[    4.168146] [<c07ea5d0>] (component_add) from [<c07f2b0c>] (platform_drv_probe+0x48/0x9c)
[    4.176737] [<c07f2b0c>] (platform_drv_probe) from [<c07f08c4>] (really_probe+0x278/0x404)
[    4.184981] [<c07f08c4>] (really_probe) from [<c07f0c08>] (driver_probe_device+0x78/0x1c0)
[    4.193147] [<c07f0c08>] (driver_probe_device) from [<c07ee8a0>] (bus_for_each_drv+0x74/0xb8)
[    4.201389] [<c07ee8a0>] (bus_for_each_drv) from [<c07f05b0>] (__device_attach+0xd0/0x164)
[    4.209984] [<c07f05b0>] (__device_attach) from [<c07ef7d0>] (bus_probe_device+0x84/0x8c)
[    4.218143] [<c07ef7d0>] (bus_probe_device) from [<c07efd68>] (deferred_probe_work_func+0x48/0xc4)
[    4.226398] [<c07efd68>] (deferred_probe_work_func) from [<c033c038>] (process_one_work+0x204/0x574)
[    4.235254] [<c033c038>] (process_one_work) from [<c033ce54>] (worker_thread+0x30/0x560)
[    4.244534] [<c033ce54>] (worker_thread) from [<c0341db0>] (kthread+0x124/0x154)
[    4.252606] [<c0341db0>] (kthread) from [<c03010e8>] (ret_from_fork+0x14/0x2c)
[    4.259966] Exception stack(0xee297fb0 to 0xee297ff8)
[    4.266998] 7fa0:                                     00000000 00000000 00000000 00000000
[    4.272143] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    4.280297] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[    4.288451] Code: e5813080 1a000013 e3a03001 e5c4307c (e590009c)
[    4.294933] ---[ end trace 18729cc2bca2b4b3 ]---

Signed-off-by: Luca Weiss <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
neilbrown pushed a commit that referenced this pull request Jan 1, 2020
…text

commit 0c9e8b3 upstream.

stub_probe() and stub_disconnect() call functions which could call
sleeping function in invalid context whil holding busid_lock.

Fix the problem by refining the lock holds to short critical sections
to change the busid_priv fields. This fix restructures the code to
limit the lock holds in stub_probe() and stub_disconnect().

stub_probe():

[15217.927028] BUG: sleeping function called from invalid context at mm/slab.h:418
[15217.927038] in_atomic(): 1, irqs_disabled(): 0, pid: 29087, name: usbip
[15217.927044] 5 locks held by usbip/29087:
[15217.927047]  #0: 0000000091647f28 (sb_writers#6){....}, at: vfs_write+0x191/0x1c0
[15217.927062]  #1: 000000008f9ba75b (&of->mutex){....}, at: kernfs_fop_write+0xf7/0x1b0
[15217.927072]  #2: 00000000872e5b4b (&dev->mutex){....}, at: __device_driver_lock+0x3b/0x50
[15217.927082]  #3: 00000000e74ececc (&dev->mutex){....}, at: __device_driver_lock+0x46/0x50
[15217.927090]  #4: 00000000b20abbe0 (&(&busid_table[i].busid_lock)->rlock){....}, at: get_busid_priv+0x48/0x60 [usbip_host]
[15217.927103] CPU: 3 PID: 29087 Comm: usbip Tainted: G        W         5.1.0-rc6+ torvalds#40
[15217.927106] Hardware name: Dell Inc. OptiPlex 790/0HY9JP, BIOS A18 09/24/2013
[15217.927109] Call Trace:
[15217.927118]  dump_stack+0x63/0x85
[15217.927127]  ___might_sleep+0xff/0x120
[15217.927133]  __might_sleep+0x4a/0x80
[15217.927143]  kmem_cache_alloc_trace+0x1aa/0x210
[15217.927156]  stub_probe+0xe8/0x440 [usbip_host]
[15217.927171]  usb_probe_device+0x34/0x70

stub_disconnect():

[15279.182478] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:908
[15279.182487] in_atomic(): 1, irqs_disabled(): 0, pid: 29114, name: usbip
[15279.182492] 5 locks held by usbip/29114:
[15279.182494]  #0: 0000000091647f28 (sb_writers#6){....}, at: vfs_write+0x191/0x1c0
[15279.182506]  #1: 00000000702cf0f3 (&of->mutex){....}, at: kernfs_fop_write+0xf7/0x1b0
[15279.182514]  #2: 00000000872e5b4b (&dev->mutex){....}, at: __device_driver_lock+0x3b/0x50
[15279.182522]  #3: 00000000e74ececc (&dev->mutex){....}, at: __device_driver_lock+0x46/0x50
[15279.182529]  #4: 00000000b20abbe0 (&(&busid_table[i].busid_lock)->rlock){....}, at: get_busid_priv+0x48/0x60 [usbip_host]
[15279.182541] CPU: 0 PID: 29114 Comm: usbip Tainted: G        W         5.1.0-rc6+ torvalds#40
[15279.182543] Hardware name: Dell Inc. OptiPlex 790/0HY9JP, BIOS A18 09/24/2013
[15279.182546] Call Trace:
[15279.182554]  dump_stack+0x63/0x85
[15279.182561]  ___might_sleep+0xff/0x120
[15279.182566]  __might_sleep+0x4a/0x80
[15279.182574]  __mutex_lock+0x55/0x950
[15279.182582]  ? get_busid_priv+0x48/0x60 [usbip_host]
[15279.182587]  ? reacquire_held_locks+0xec/0x1a0
[15279.182591]  ? get_busid_priv+0x48/0x60 [usbip_host]
[15279.182597]  ? find_held_lock+0x94/0xa0
[15279.182609]  mutex_lock_nested+0x1b/0x20
[15279.182614]  ? mutex_lock_nested+0x1b/0x20
[15279.182618]  kernfs_remove_by_name_ns+0x2a/0x90
[15279.182625]  sysfs_remove_file_ns+0x15/0x20
[15279.182629]  device_remove_file+0x19/0x20
[15279.182634]  stub_disconnect+0x6d/0x180 [usbip_host]
[15279.182643]  usb_unbind_device+0x27/0x60

Signed-off-by: Shuah Khan <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
neilbrown pushed a commit that referenced this pull request Jan 1, 2020
…rupts

commit ef97402 upstream.

The passthrough interrupts are defined at the host level and their IRQ
data should not be cleared unless specifically deconfigured (shutdown)
by the host. They differ from the IPI interrupts which are allocated
by the XIVE KVM device and reserved to the guest usage only.

This fixes a host crash when destroying a VM in which a PCI adapter
was passed-through. In this case, the interrupt is cleared and freed
by the KVM device and then shutdown by vfio at the host level.

[ 1007.360265] BUG: Kernel NULL pointer dereference at 0x00000d00
[ 1007.360285] Faulting instruction address: 0xc00000000009da34
[ 1007.360296] Oops: Kernel access of bad area, sig: 7 [#1]
[ 1007.360303] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
[ 1007.360314] Modules linked in: vhost_net vhost iptable_mangle ipt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 tun bridge stp llc kvm_hv kvm xt_tcpudp iptable_filter squashfs fuse binfmt_misc vmx_crypto ib_iser rdma_cm iw_cm ib_cm libiscsi scsi_transport_iscsi nfsd ip_tables x_tables autofs4 btrfs zstd_decompress zstd_compress lzo_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq multipath mlx5_ib ib_uverbs ib_core crc32c_vpmsum mlx5_core
[ 1007.360425] CPU: 9 PID: 15576 Comm: CPU 18/KVM Kdump: loaded Not tainted 5.1.0-gad7e7d0ef #4
[ 1007.360454] NIP:  c00000000009da34 LR: c00000000009e50c CTR: c00000000009e5d0
[ 1007.360482] REGS: c000007f24ccf330 TRAP: 0300   Not tainted  (5.1.0-gad7e7d0ef)
[ 1007.360500] MSR:  900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24002484  XER: 00000000
[ 1007.360532] CFAR: c00000000009da10 DAR: 0000000000000d00 DSISR: 00080000 IRQMASK: 1
[ 1007.360532] GPR00: c00000000009e62c c000007f24ccf5c0 c000000001510600 c000007fe7f947c0
[ 1007.360532] GPR04: 0000000000000d00 0000000000000000 0000000000000000 c000005eff02d200
[ 1007.360532] GPR08: 0000000000400000 0000000000000000 0000000000000000 fffffffffffffffd
[ 1007.360532] GPR12: c00000000009e5d0 c000007fffff7b00 0000000000000031 000000012c345718
[ 1007.360532] GPR16: 0000000000000000 0000000000000008 0000000000418004 0000000000040100
[ 1007.360532] GPR20: 0000000000000000 0000000008430000 00000000003c0000 0000000000000027
[ 1007.360532] GPR24: 00000000000000ff 0000000000000000 00000000000000ff c000007faa90d98c
[ 1007.360532] GPR28: c000007faa90da40 00000000000fe040 ffffffffffffffff c000007fe7f947c0
[ 1007.360689] NIP [c00000000009da34] xive_esb_read+0x34/0x120
[ 1007.360706] LR [c00000000009e50c] xive_do_source_set_mask.part.0+0x2c/0x50
[ 1007.360732] Call Trace:
[ 1007.360738] [c000007f24ccf5c0] [c000000000a6383c] snooze_loop+0x15c/0x270 (unreliable)
[ 1007.360775] [c000007f24ccf5f0] [c00000000009e62c] xive_irq_shutdown+0x5c/0xe0
[ 1007.360795] [c000007f24ccf630] [c00000000019e4a0] irq_shutdown+0x60/0xe0
[ 1007.360813] [c000007f24ccf660] [c000000000198c44] __free_irq+0x3a4/0x420
[ 1007.360831] [c000007f24ccf700] [c000000000198dc8] free_irq+0x78/0xe0
[ 1007.360849] [c000007f24ccf730] [c00000000096c5a8] vfio_msi_set_vector_signal+0xa8/0x350
[ 1007.360878] [c000007f24ccf7f0] [c00000000096c938] vfio_msi_set_block+0xe8/0x1e0
[ 1007.360899] [c000007f24ccf850] [c00000000096cae0] vfio_msi_disable+0xb0/0x110
[ 1007.360912] [c000007f24ccf8a0] [c00000000096cd04] vfio_pci_set_msi_trigger+0x1c4/0x3d0
[ 1007.360922] [c000007f24ccf910] [c00000000096d910] vfio_pci_set_irqs_ioctl+0xa0/0x170
[ 1007.360941] [c000007f24ccf930] [c00000000096b400] vfio_pci_disable+0x80/0x5e0
[ 1007.360963] [c000007f24ccfa10] [c00000000096b9bc] vfio_pci_release+0x5c/0x90
[ 1007.360991] [c000007f24ccfa40] [c000000000963a9c] vfio_device_fops_release+0x3c/0x70
[ 1007.361012] [c000007f24ccfa70] [c0000000003b5668] __fput+0xc8/0x2b0
[ 1007.361040] [c000007f24ccfac0] [c0000000001409b0] task_work_run+0x140/0x1b0
[ 1007.361059] [c000007f24ccfb20] [c000000000118f8c] do_exit+0x3ac/0xd00
[ 1007.361076] [c000007f24ccfc00] [c0000000001199b0] do_group_exit+0x60/0x100
[ 1007.361094] [c000007f24ccfc40] [c00000000012b514] get_signal+0x1a4/0x8f0
[ 1007.361112] [c000007f24ccfd30] [c000000000021cc8] do_notify_resume+0x1a8/0x430
[ 1007.361141] [c000007f24ccfe20] [c00000000000e444] ret_from_except_lite+0x70/0x74
[ 1007.361159] Instruction dump:
[ 1007.361175] 38422c00 e9230000 712a0004 41820010 548a2036 7d442378 78840020 71290020
[ 1007.361194] 4082004c e9230010 7c892214 7c0004ac <e9240000> 0c090000 4c00012c 792a0022

Cc: [email protected] # v4.12+
Fixes: 5af5099 ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller")
Signed-off-by: Cédric Le Goater <[email protected]>
Signed-off-by: Greg Kurz <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
neilbrown pushed a commit that referenced this pull request Jan 1, 2020
commit 52b042a upstream.

Commit b7dbcc0 "NFSv4.1: Fix a race where CB_NOTIFY_LOCK fails to wake a waiter"
found this bug. However it didn't fix it.

This commit replaces schedule_timeout() with wait_woken() and
default_wake_function() with woken_wake_function() in function
nfs4_retry_setlk() and nfs4_wake_lock_waiter(). wait_woken() uses
memory barriers in its implementation to avoid potential race condition
when putting a process into sleeping state and then waking it up.

Fixes: a1d617d ("nfs: allow blocking locks to be awoken by lock callbacks")
Cc: [email protected] #4.9+
Signed-off-by: Yihao Wu <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
neilbrown pushed a commit that referenced this pull request Jan 1, 2020
commit ba851a3 upstream.

When a waiter is waked by CB_NOTIFY_LOCK, it will retry
nfs4_proc_setlk(). The waiter may fail to nfs4_proc_setlk() and sleep
again. However, the waiter is already removed from clp->cl_lock_waitq
when handling CB_NOTIFY_LOCK in nfs4_wake_lock_waiter(). So any
subsequent CB_NOTIFY_LOCK won't wake this waiter anymore. We should
put the waiter back to clp->cl_lock_waitq before retrying.

Cc: [email protected] #4.9+
Signed-off-by: Yihao Wu <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
neilbrown pushed a commit that referenced this pull request Jan 1, 2020
[ Upstream commit 198790d ]

In free_percpu() we sometimes call pcpu_schedule_balance_work() to
queue a work item (which does a wakeup) while holding pcpu_lock.
This creates an unnecessary lock dependency between pcpu_lock and
the scheduler's pi_lock.  There are other places where we call
pcpu_schedule_balance_work() without hold pcpu_lock, and this case
doesn't need to be different.

Moving the call outside the lock prevents the following lockdep splat
when running tools/testing/selftests/bpf/{test_maps,test_progs} in
sequence with lockdep enabled:

======================================================
WARNING: possible circular locking dependency detected
5.1.0-dbg-DEV #1 Not tainted
------------------------------------------------------
kworker/23:255/18872 is trying to acquire lock:
000000000bc79290 (&(&pool->lock)->rlock){-.-.}, at: __queue_work+0xb2/0x520

but task is already holding lock:
00000000e3e7a6aa (pcpu_lock){..-.}, at: free_percpu+0x36/0x260

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #4 (pcpu_lock){..-.}:
       lock_acquire+0x9e/0x180
       _raw_spin_lock_irqsave+0x3a/0x50
       pcpu_alloc+0xfa/0x780
       __alloc_percpu_gfp+0x12/0x20
       alloc_htab_elem+0x184/0x2b0
       __htab_percpu_map_update_elem+0x252/0x290
       bpf_percpu_hash_update+0x7c/0x130
       __do_sys_bpf+0x1912/0x1be0
       __x64_sys_bpf+0x1a/0x20
       do_syscall_64+0x59/0x400
       entry_SYSCALL_64_after_hwframe+0x49/0xbe

-> #3 (&htab->buckets[i].lock){....}:
       lock_acquire+0x9e/0x180
       _raw_spin_lock_irqsave+0x3a/0x50
       htab_map_update_elem+0x1af/0x3a0

-> #2 (&rq->lock){-.-.}:
       lock_acquire+0x9e/0x180
       _raw_spin_lock+0x2f/0x40
       task_fork_fair+0x37/0x160
       sched_fork+0x211/0x310
       copy_process.part.43+0x7b1/0x2160
       _do_fork+0xda/0x6b0
       kernel_thread+0x29/0x30
       rest_init+0x22/0x260
       arch_call_rest_init+0xe/0x10
       start_kernel+0x4fd/0x520
       x86_64_start_reservations+0x24/0x26
       x86_64_start_kernel+0x6f/0x72
       secondary_startup_64+0xa4/0xb0

-> #1 (&p->pi_lock){-.-.}:
       lock_acquire+0x9e/0x180
       _raw_spin_lock_irqsave+0x3a/0x50
       try_to_wake_up+0x41/0x600
       wake_up_process+0x15/0x20
       create_worker+0x16b/0x1e0
       workqueue_init+0x279/0x2ee
       kernel_init_freeable+0xf7/0x288
       kernel_init+0xf/0x180
       ret_from_fork+0x24/0x30

-> #0 (&(&pool->lock)->rlock){-.-.}:
       __lock_acquire+0x101f/0x12a0
       lock_acquire+0x9e/0x180
       _raw_spin_lock+0x2f/0x40
       __queue_work+0xb2/0x520
       queue_work_on+0x38/0x80
       free_percpu+0x221/0x260
       pcpu_freelist_destroy+0x11/0x20
       stack_map_free+0x2a/0x40
       bpf_map_free_deferred+0x3c/0x50
       process_one_work+0x1f7/0x580
       worker_thread+0x54/0x410
       kthread+0x10f/0x150
       ret_from_fork+0x24/0x30

other info that might help us debug this:

Chain exists of:
  &(&pool->lock)->rlock --> &htab->buckets[i].lock --> pcpu_lock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(pcpu_lock);
                               lock(&htab->buckets[i].lock);
                               lock(pcpu_lock);
  lock(&(&pool->lock)->rlock);

 *** DEADLOCK ***

3 locks held by kworker/23:255/18872:
 #0: 00000000b36a6e16 ((wq_completion)events){+.+.},
     at: process_one_work+0x17a/0x580
 #1: 00000000dfd966f0 ((work_completion)(&map->work)){+.+.},
     at: process_one_work+0x17a/0x580
 #2: 00000000e3e7a6aa (pcpu_lock){..-.},
     at: free_percpu+0x36/0x260

stack backtrace:
CPU: 23 PID: 18872 Comm: kworker/23:255 Not tainted 5.1.0-dbg-DEV #1
Hardware name: ...
Workqueue: events bpf_map_free_deferred
Call Trace:
 dump_stack+0x67/0x95
 print_circular_bug.isra.38+0x1c6/0x220
 check_prev_add.constprop.50+0x9f6/0xd20
 __lock_acquire+0x101f/0x12a0
 lock_acquire+0x9e/0x180
 _raw_spin_lock+0x2f/0x40
 __queue_work+0xb2/0x520
 queue_work_on+0x38/0x80
 free_percpu+0x221/0x260
 pcpu_freelist_destroy+0x11/0x20
 stack_map_free+0x2a/0x40
 bpf_map_free_deferred+0x3c/0x50
 process_one_work+0x1f7/0x580
 worker_thread+0x54/0x410
 kthread+0x10f/0x150
 ret_from_fork+0x24/0x30

Signed-off-by: John Sperbeck <[email protected]>
Signed-off-by: Dennis Zhou <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
neilbrown pushed a commit that referenced this pull request Jan 1, 2020
[ Upstream commit 689a586 ]

Memory: 509108K/542612K available (3835K kernel code, 919K rwdata, 1028K rodata, 129K init, 211K bss, 33504K reserved, 0K cma-reserved)
NR_IRQS: 15
clocksource: timer: mask: 0xffffffffffffffff max_cycles: 0x1cd42e205, max_idle_ns: 881590404426 ns
------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at kernel/time/clockevents.c:458 clockevents_register_device+0x72/0x140
posix-timer cpumask == cpu_all_mask, using cpu_possible_mask instead
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 5.1.0-rc4-00048-ged79cc87302b #4
Stack:
 604ebda0 603c5370 604ebe20 6046fd17
 00000000 6006fcbb 604ebdb0 603c53b5
 604ebe10 6003bfc4 604ebdd0 9000001ca
Call Trace:
 [<6006fcbb>] ? printk+0x0/0x94
 [<60083160>] ? clockevents_register_device+0x72/0x140
 [<6001f16e>] show_stack+0x13b/0x155
 [<603c5370>] ? dump_stack_print_info+0xe2/0xeb
 [<6006fcbb>] ? printk+0x0/0x94
 [<603c53b5>] dump_stack+0x2a/0x2c
 [<6003bfc4>] __warn+0x10e/0x13e
 [<60070320>] ? vprintk_func+0xc8/0xcf
 [<60030fd6>] ? block_signals+0x0/0x16
 [<6006fcbb>] ? printk+0x0/0x94
 [<6003c08b>] warn_slowpath_fmt+0x97/0x99
 [<600311a1>] ? set_signals+0x0/0x3f
 [<6003bff4>] ? warn_slowpath_fmt+0x0/0x99
 [<600842cb>] ? tick_oneshot_mode_active+0x44/0x4f
 [<60030fd6>] ? block_signals+0x0/0x16
 [<6006fcbb>] ? printk+0x0/0x94
 [<6007d2d5>] ? __clocksource_select+0x20/0x1b1
 [<60030fd6>] ? block_signals+0x0/0x16
 [<6006fcbb>] ? printk+0x0/0x94
 [<60083160>] clockevents_register_device+0x72/0x140
 [<60031192>] ? get_signals+0x0/0xf
 [<60030fd6>] ? block_signals+0x0/0x16
 [<6006fcbb>] ? printk+0x0/0x94
 [<60002eec>] um_timer_setup+0xc8/0xca
 [<60001b59>] start_kernel+0x47f/0x57e
 [<600035bc>] start_kernel_proc+0x49/0x4d
 [<6006c483>] ? kmsg_dump_register+0x82/0x8a
 [<6001de62>] new_thread_handler+0x81/0xb2
 [<60003571>] ? kmsg_dumper_stdout_init+0x1a/0x1c
 [<60020c75>] uml_finishsetup+0x54/0x59

random: get_random_bytes called from init_oops_id+0x27/0x34 with crng_init=0
---[ end trace 00173d0117a88acb ]---
Calibrating delay loop... 6941.90 BogoMIPS (lpj=34709504)

Signed-off-by: Maciej Żenczykowski <[email protected]>
Cc: Jeff Dike <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Anton Ivanov <[email protected]>
Cc: [email protected]
Cc: [email protected]

Signed-off-by: Richard Weinberger <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
neilbrown pushed a commit that referenced this pull request Jan 7, 2020
We got another syzbot report [1] that tells us we must use
write_lock_irq()/write_unlock_irq() to avoid possible deadlock.

[1]

WARNING: inconsistent lock state
5.5.0-rc1-syzkaller #0 Not tainted
--------------------------------
inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-R} usage.
syz-executor826/9605 [HC1[1]:SC0[0]:HE0:SE1] takes:
ffffffff8a128718 (disc_data_lock){+-..}, at: sp_get.isra.0+0x1d/0xf0 drivers/net/ppp/ppp_synctty.c:138
{HARDIRQ-ON-W} state was registered at:
  lock_acquire+0x190/0x410 kernel/locking/lockdep.c:4485
  __raw_write_lock_bh include/linux/rwlock_api_smp.h:203 [inline]
  _raw_write_lock_bh+0x33/0x50 kernel/locking/spinlock.c:319
  sixpack_close+0x1d/0x250 drivers/net/hamradio/6pack.c:657
  tty_ldisc_close.isra.0+0x119/0x1a0 drivers/tty/tty_ldisc.c:489
  tty_set_ldisc+0x230/0x6b0 drivers/tty/tty_ldisc.c:585
  tiocsetd drivers/tty/tty_io.c:2337 [inline]
  tty_ioctl+0xe8d/0x14f0 drivers/tty/tty_io.c:2597
  vfs_ioctl fs/ioctl.c:47 [inline]
  file_ioctl fs/ioctl.c:545 [inline]
  do_vfs_ioctl+0x977/0x14e0 fs/ioctl.c:732
  ksys_ioctl+0xab/0xd0 fs/ioctl.c:749
  __do_sys_ioctl fs/ioctl.c:756 [inline]
  __se_sys_ioctl fs/ioctl.c:754 [inline]
  __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:754
  do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
irq event stamp: 3946
hardirqs last  enabled at (3945): [<ffffffff87c86e43>] __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:168 [inline]
hardirqs last  enabled at (3945): [<ffffffff87c86e43>] _raw_spin_unlock_irq+0x23/0x80 kernel/locking/spinlock.c:199
hardirqs last disabled at (3946): [<ffffffff8100675f>] trace_hardirqs_off_thunk+0x1a/0x1c arch/x86/entry/thunk_64.S:42
softirqs last  enabled at (2658): [<ffffffff86a8b4df>] spin_unlock_bh include/linux/spinlock.h:383 [inline]
softirqs last  enabled at (2658): [<ffffffff86a8b4df>] clusterip_netdev_event+0x46f/0x670 net/ipv4/netfilter/ipt_CLUSTERIP.c:222
softirqs last disabled at (2656): [<ffffffff86a8b22b>] spin_lock_bh include/linux/spinlock.h:343 [inline]
softirqs last disabled at (2656): [<ffffffff86a8b22b>] clusterip_netdev_event+0x1bb/0x670 net/ipv4/netfilter/ipt_CLUSTERIP.c:196

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(disc_data_lock);
  <Interrupt>
    lock(disc_data_lock);

 *** DEADLOCK ***

5 locks held by syz-executor826/9605:
 #0: ffff8880a905e198 (&tty->legacy_mutex){+.+.}, at: tty_lock+0xc7/0x130 drivers/tty/tty_mutex.c:19
 #1: ffffffff899a56c0 (rcu_read_lock){....}, at: mutex_spin_on_owner+0x0/0x330 kernel/locking/mutex.c:413
 #2: ffff8880a496a2b0 (&(&i->lock)->rlock){-.-.}, at: spin_lock include/linux/spinlock.h:338 [inline]
 #2: ffff8880a496a2b0 (&(&i->lock)->rlock){-.-.}, at: serial8250_interrupt+0x2d/0x1a0 drivers/tty/serial/8250/8250_core.c:116
 #3: ffffffff8c104048 (&port_lock_key){-.-.}, at: serial8250_handle_irq.part.0+0x24/0x330 drivers/tty/serial/8250/8250_port.c:1823
 #4: ffff8880a905e090 (&tty->ldisc_sem){++++}, at: tty_ldisc_ref+0x22/0x90 drivers/tty/tty_ldisc.c:288

stack backtrace:
CPU: 1 PID: 9605 Comm: syz-executor826 Not tainted 5.5.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x197/0x210 lib/dump_stack.c:118
 print_usage_bug.cold+0x327/0x378 kernel/locking/lockdep.c:3101
 valid_state kernel/locking/lockdep.c:3112 [inline]
 mark_lock_irq kernel/locking/lockdep.c:3309 [inline]
 mark_lock+0xbb4/0x1220 kernel/locking/lockdep.c:3666
 mark_usage kernel/locking/lockdep.c:3554 [inline]
 __lock_acquire+0x1e55/0x4a00 kernel/locking/lockdep.c:3909
 lock_acquire+0x190/0x410 kernel/locking/lockdep.c:4485
 __raw_read_lock include/linux/rwlock_api_smp.h:149 [inline]
 _raw_read_lock+0x32/0x50 kernel/locking/spinlock.c:223
 sp_get.isra.0+0x1d/0xf0 drivers/net/ppp/ppp_synctty.c:138
 sixpack_write_wakeup+0x25/0x340 drivers/net/hamradio/6pack.c:402
 tty_wakeup+0xe9/0x120 drivers/tty/tty_io.c:536
 tty_port_default_wakeup+0x2b/0x40 drivers/tty/tty_port.c:50
 tty_port_tty_wakeup+0x57/0x70 drivers/tty/tty_port.c:387
 uart_write_wakeup+0x46/0x70 drivers/tty/serial/serial_core.c:104
 serial8250_tx_chars+0x495/0xaf0 drivers/tty/serial/8250/8250_port.c:1761
 serial8250_handle_irq.part.0+0x2a2/0x330 drivers/tty/serial/8250/8250_port.c:1834
 serial8250_handle_irq drivers/tty/serial/8250/8250_port.c:1820 [inline]
 serial8250_default_handle_irq+0xc0/0x150 drivers/tty/serial/8250/8250_port.c:1850
 serial8250_interrupt+0xf1/0x1a0 drivers/tty/serial/8250/8250_core.c:126
 __handle_irq_event_percpu+0x15d/0x970 kernel/irq/handle.c:149
 handle_irq_event_percpu+0x74/0x160 kernel/irq/handle.c:189
 handle_irq_event+0xa7/0x134 kernel/irq/handle.c:206
 handle_edge_irq+0x25e/0x8d0 kernel/irq/chip.c:830
 generic_handle_irq_desc include/linux/irqdesc.h:156 [inline]
 do_IRQ+0xde/0x280 arch/x86/kernel/irq.c:250
 common_interrupt+0xf/0xf arch/x86/entry/entry_64.S:607
 </IRQ>
RIP: 0010:cpu_relax arch/x86/include/asm/processor.h:685 [inline]
RIP: 0010:mutex_spin_on_owner+0x247/0x330 kernel/locking/mutex.c:579
Code: c3 be 08 00 00 00 4c 89 e7 e8 e5 06 59 00 4c 89 e0 48 c1 e8 03 42 80 3c 38 00 0f 85 e1 00 00 00 49 8b 04 24 a8 01 75 96 f3 90 <e9> 2f fe ff ff 0f 0b e8 0d 19 09 00 84 c0 0f 85 ff fd ff ff 48 c7
RSP: 0018:ffffc90001eafa20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd7
RAX: 0000000000000000 RBX: ffff88809fd9e0c0 RCX: 1ffffffff13266dd
RDX: 0000000000000000 RSI: 0000000000000008 RDI: 0000000000000000
RBP: ffffc90001eafa60 R08: 1ffff11013d22898 R09: ffffed1013d22899
R10: ffffed1013d22898 R11: ffff88809e9144c7 R12: ffff8880a905e138
R13: ffff88809e9144c0 R14: 0000000000000000 R15: dffffc0000000000
 mutex_optimistic_spin kernel/locking/mutex.c:673 [inline]
 __mutex_lock_common kernel/locking/mutex.c:962 [inline]
 __mutex_lock+0x32b/0x13c0 kernel/locking/mutex.c:1106
 mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1121
 tty_lock+0xc7/0x130 drivers/tty/tty_mutex.c:19
 tty_release+0xb5/0xe90 drivers/tty/tty_io.c:1665
 __fput+0x2ff/0x890 fs/file_table.c:280
 ____fput+0x16/0x20 fs/file_table.c:313
 task_work_run+0x145/0x1c0 kernel/task_work.c:113
 exit_task_work include/linux/task_work.h:22 [inline]
 do_exit+0x8e7/0x2ef0 kernel/exit.c:797
 do_group_exit+0x135/0x360 kernel/exit.c:895
 __do_sys_exit_group kernel/exit.c:906 [inline]
 __se_sys_exit_group kernel/exit.c:904 [inline]
 __x64_sys_exit_group+0x44/0x50 kernel/exit.c:904
 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x43fef8
Code: Bad RIP value.
RSP: 002b:00007ffdb07d2338 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000000000043fef8
RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
RBP: 00000000004bf730 R08: 00000000000000e7 R09: ffffffffffffffd0
R10: 00000000004002c8 R11: 0000000000000246 R12: 0000000000000001
R13: 00000000006d1180 R14: 0000000000000000 R15: 0000000000000000

Fixes: 6e4e2f8 ("6pack,mkiss: fix lock inconsistency")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: syzbot <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
neilbrown pushed a commit that referenced this pull request Jan 7, 2020
The "auto-attach" handler function `gsc_hpdi_auto_attach()` calls
`dma_alloc_coherent()` in a loop to allocate some DMA data buffers, and
also calls it to allocate a buffer for a DMA descriptor chain.  However,
it does not check the return value of any of these calls.  Change
`gsc_hpdi_auto_attach()` to return `-ENOMEM` if any of these
`dma_alloc_coherent()` calls fail.  This will result in the comedi core
calling the "detach" handler `gsc_hpdi_detach()` as part of the
clean-up, which will call `gsc_hpdi_free_dma()` to free any allocated
DMA coherent memory buffers.

Cc: <[email protected]> #4.6+
Signed-off-by: Ian Abbott <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
neilbrown pushed a commit that referenced this pull request Jan 7, 2020
The PTP egress timestamp N must be captured from register PTPEGR_TS[n],
where n = 2 * PORT + TSREG. There are 10 PTPEGR_TS registers, 2 per
port. We are only using TSREG=0.

As opposed to the management slots, which are 4 in number
(SJA1105_NUM_PORTS, minus the CPU port). Any management frame (which
includes PTP frames) can be sent to any non-CPU port through any
management slot. When the CPU port is not the last port (#4), there will
be a mismatch between the slot and the port number.

Luckily, the only mainline occurrence with this switch
(arch/arm/boot/dts/ls1021a-tsn.dts) does have the CPU port as #4, so the
issue did not manifest itself thus far.

Fixes: 47ed985 ("net: dsa: sja1105: Add logic for TX timestamping")
Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
neilbrown pushed a commit that referenced this pull request Jan 25, 2020
[ Upstream commit 32546a9 ]

This patch moves the final part of the cifsFileInfo_put() logic where we
need a write lock on lock_sem to be processed in a separate thread that
holds no other locks.
This is to prevent deadlocks like the one below:

> there are 6 processes looping to while trying to down_write
> cinode->lock_sem, 5 of them from _cifsFileInfo_put, and one from
> cifs_new_fileinfo
>
> and there are 5 other processes which are blocked, several of them
> waiting on either PG_writeback or PG_locked (which are both set), all
> for the same page of the file
>
> 2 inode_lock() (inode->i_rwsem) for the file
> 1 wait_on_page_writeback() for the page
> 1 down_read(inode->i_rwsem) for the inode of the directory
> 1 inode_lock()(inode->i_rwsem) for the inode of the directory
> 1 __lock_page
>
>
> so processes are blocked waiting on:
>   page flags PG_locked and PG_writeback for one specific page
>   inode->i_rwsem for the directory
>   inode->i_rwsem for the file
>   cifsInodeInflock_sem
>
>
>
> here are the more gory details (let me know if I need to provide
> anything more/better):
>
> [0 00:48:22.765] [UN]  PID: 8863   TASK: ffff8c691547c5c0  CPU: 3
> COMMAND: "reopen_file"
>  #0 [ffff9965007e3ba8] __schedule at ffffffff9b6e6095
>  #1 [ffff9965007e3c38] schedule at ffffffff9b6e64df
>  #2 [ffff9965007e3c48] rwsem_down_write_slowpath at ffffffff9af283d7
>  #3 [ffff9965007e3cb8] legitimize_path at ffffffff9b0f975d
>  #4 [ffff9965007e3d08] path_openat at ffffffff9b0fe55d
>  #5 [ffff9965007e3dd8] do_filp_open at ffffffff9b100a33
>  torvalds#6 [ffff9965007e3ee0] do_sys_open at ffffffff9b0eb2d6
>  torvalds#7 [ffff9965007e3f38] do_syscall_64 at ffffffff9ae04315
> * (I think legitimize_path is bogus)
>
> in path_openat
>         } else {
>                 const char *s = path_init(nd, flags);
>                 while (!(error = link_path_walk(s, nd)) &&
>                         (error = do_last(nd, file, op)) > 0) {  <<<<
>
> do_last:
>         if (open_flag & O_CREAT)
>                 inode_lock(dir->d_inode);  <<<<
>         else
> so it's trying to take inode->i_rwsem for the directory
>
>      DENTRY           INODE           SUPERBLK     TYPE PATH
> ffff8c68bb8e79c0 ffff8c691158ef20 ffff8c6915bf9000 DIR  /mnt/vm1_smb/
> inode.i_rwsem is ffff8c691158efc0
>
> <struct rw_semaphore 0xffff8c691158efc0>:
>         owner: <struct task_struct 0xffff8c6914275d00> (UN -   8856 -
> reopen_file), counter: 0x0000000000000003
>         waitlist: 2
>         0xffff9965007e3c90     8863   reopen_file      UN 0  1:29:22.926
>   RWSEM_WAITING_FOR_WRITE
>         0xffff996500393e00     9802   ls               UN 0  1:17:26.700
>   RWSEM_WAITING_FOR_READ
>
>
> the owner of the inode.i_rwsem of the directory is:
>
> [0 00:00:00.109] [UN]  PID: 8856   TASK: ffff8c6914275d00  CPU: 3
> COMMAND: "reopen_file"
>  #0 [ffff99650065b828] __schedule at ffffffff9b6e6095
>  #1 [ffff99650065b8b8] schedule at ffffffff9b6e64df
>  #2 [ffff99650065b8c8] schedule_timeout at ffffffff9b6e9f89
>  #3 [ffff99650065b940] msleep at ffffffff9af573a9
>  #4 [ffff99650065b948] _cifsFileInfo_put.cold.63 at ffffffffc0a42dd6 [cifs]
>  #5 [ffff99650065ba38] cifs_writepage_locked at ffffffffc0a0b8f3 [cifs]
>  torvalds#6 [ffff99650065bab0] cifs_launder_page at ffffffffc0a0bb72 [cifs]
>  torvalds#7 [ffff99650065bb30] invalidate_inode_pages2_range at ffffffff9b04d4bd
>  torvalds#8 [ffff99650065bcb8] cifs_invalidate_mapping at ffffffffc0a11339 [cifs]
>  torvalds#9 [ffff99650065bcd0] cifs_revalidate_mapping at ffffffffc0a1139a [cifs]
> torvalds#10 [ffff99650065bcf0] cifs_d_revalidate at ffffffffc0a014f6 [cifs]
> torvalds#11 [ffff99650065bd08] path_openat at ffffffff9b0fe7f7
> torvalds#12 [ffff99650065bdd8] do_filp_open at ffffffff9b100a33
> torvalds#13 [ffff99650065bee0] do_sys_open at ffffffff9b0eb2d6
> torvalds#14 [ffff99650065bf38] do_syscall_64 at ffffffff9ae04315
>
> cifs_launder_page is for page 0xffffd1e2c07d2480
>
> crash> page.index,mapping,flags 0xffffd1e2c07d2480
>       index = 0x8
>       mapping = 0xffff8c68f3cd0db0
>   flags = 0xfffffc0008095
>
>   PAGE-FLAG       BIT  VALUE
>   PG_locked         0  0000001
>   PG_uptodate       2  0000004
>   PG_lru            4  0000010
>   PG_waiters        7  0000080
>   PG_writeback     15  0008000
>
>
> inode is ffff8c68f3cd0c40
> inode.i_rwsem is ffff8c68f3cd0ce0
>      DENTRY           INODE           SUPERBLK     TYPE PATH
> ffff8c68a1f1b480 ffff8c68f3cd0c40 ffff8c6915bf9000 REG
> /mnt/vm1_smb/testfile.8853
>
>
> this process holds the inode->i_rwsem for the parent directory, is
> laundering a page attached to the inode of the file it's opening, and in
> _cifsFileInfo_put is trying to down_write the cifsInodeInflock_sem
> for the file itself.
>
>
> <struct rw_semaphore 0xffff8c68f3cd0ce0>:
>         owner: <struct task_struct 0xffff8c6914272e80> (UN -   8854 -
> reopen_file), counter: 0x0000000000000003
>         waitlist: 1
>         0xffff9965005dfd80     8855   reopen_file      UN 0  1:29:22.912
>   RWSEM_WAITING_FOR_WRITE
>
> this is the inode.i_rwsem for the file
>
> the owner:
>
> [0 00:48:22.739] [UN]  PID: 8854   TASK: ffff8c6914272e80  CPU: 2
> COMMAND: "reopen_file"
>  #0 [ffff99650054fb38] __schedule at ffffffff9b6e6095
>  #1 [ffff99650054fbc8] schedule at ffffffff9b6e64df
>  #2 [ffff99650054fbd8] io_schedule at ffffffff9b6e68e2
>  #3 [ffff99650054fbe8] __lock_page at ffffffff9b03c56f
>  #4 [ffff99650054fc80] pagecache_get_page at ffffffff9b03dcdf
>  #5 [ffff99650054fcc0] grab_cache_page_write_begin at ffffffff9b03ef4c
>  torvalds#6 [ffff99650054fcd0] cifs_write_begin at ffffffffc0a064ec [cifs]
>  torvalds#7 [ffff99650054fd30] generic_perform_write at ffffffff9b03bba4
>  torvalds#8 [ffff99650054fda8] __generic_file_write_iter at ffffffff9b04060a
>  torvalds#9 [ffff99650054fdf0] cifs_strict_writev.cold.70 at ffffffffc0a4469b [cifs]
> torvalds#10 [ffff99650054fe48] new_sync_write at ffffffff9b0ec1dd
> torvalds#11 [ffff99650054fed0] vfs_write at ffffffff9b0eed35
> torvalds#12 [ffff99650054ff00] ksys_write at ffffffff9b0eefd9
> torvalds#13 [ffff99650054ff38] do_syscall_64 at ffffffff9ae04315
>
> the process holds the inode->i_rwsem for the file to which it's writing,
> and is trying to __lock_page for the same page as in the other processes
>
>
> the other tasks:
> [0 00:00:00.028] [UN]  PID: 8859   TASK: ffff8c6915479740  CPU: 2
> COMMAND: "reopen_file"
>  #0 [ffff9965007b39d8] __schedule at ffffffff9b6e6095
>  #1 [ffff9965007b3a68] schedule at ffffffff9b6e64df
>  #2 [ffff9965007b3a78] schedule_timeout at ffffffff9b6e9f89
>  #3 [ffff9965007b3af0] msleep at ffffffff9af573a9
>  #4 [ffff9965007b3af8] cifs_new_fileinfo.cold.61 at ffffffffc0a42a07 [cifs]
>  #5 [ffff9965007b3b78] cifs_open at ffffffffc0a0709d [cifs]
>  torvalds#6 [ffff9965007b3cd8] do_dentry_open at ffffffff9b0e9b7a
>  torvalds#7 [ffff9965007b3d08] path_openat at ffffffff9b0fe34f
>  torvalds#8 [ffff9965007b3dd8] do_filp_open at ffffffff9b100a33
>  torvalds#9 [ffff9965007b3ee0] do_sys_open at ffffffff9b0eb2d6
> torvalds#10 [ffff9965007b3f38] do_syscall_64 at ffffffff9ae04315
>
> this is opening the file, and is trying to down_write cinode->lock_sem
>
>
> [0 00:00:00.041] [UN]  PID: 8860   TASK: ffff8c691547ae80  CPU: 2
> COMMAND: "reopen_file"
> [0 00:00:00.057] [UN]  PID: 8861   TASK: ffff8c6915478000  CPU: 3
> COMMAND: "reopen_file"
> [0 00:00:00.059] [UN]  PID: 8858   TASK: ffff8c6914271740  CPU: 2
> COMMAND: "reopen_file"
> [0 00:00:00.109] [UN]  PID: 8862   TASK: ffff8c691547dd00  CPU: 6
> COMMAND: "reopen_file"
>  #0 [ffff9965007c3c78] __schedule at ffffffff9b6e6095
>  #1 [ffff9965007c3d08] schedule at ffffffff9b6e64df
>  #2 [ffff9965007c3d18] schedule_timeout at ffffffff9b6e9f89
>  #3 [ffff9965007c3d90] msleep at ffffffff9af573a9
>  #4 [ffff9965007c3d98] _cifsFileInfo_put.cold.63 at ffffffffc0a42dd6 [cifs]
>  #5 [ffff9965007c3e88] cifs_close at ffffffffc0a07aaf [cifs]
>  torvalds#6 [ffff9965007c3ea0] __fput at ffffffff9b0efa6e
>  torvalds#7 [ffff9965007c3ee8] task_work_run at ffffffff9aef1614
>  torvalds#8 [ffff9965007c3f20] exit_to_usermode_loop at ffffffff9ae03d6f
>  torvalds#9 [ffff9965007c3f38] do_syscall_64 at ffffffff9ae0444c
>
> closing the file, and trying to down_write cifsi->lock_sem
>
>
> [0 00:48:22.839] [UN]  PID: 8857   TASK: ffff8c6914270000  CPU: 7
> COMMAND: "reopen_file"
>  #0 [ffff9965006a7cc8] __schedule at ffffffff9b6e6095
>  #1 [ffff9965006a7d58] schedule at ffffffff9b6e64df
>  #2 [ffff9965006a7d68] io_schedule at ffffffff9b6e68e2
>  #3 [ffff9965006a7d78] wait_on_page_bit at ffffffff9b03cac6
>  #4 [ffff9965006a7e10] __filemap_fdatawait_range at ffffffff9b03b028
>  #5 [ffff9965006a7ed8] filemap_write_and_wait at ffffffff9b040165
>  torvalds#6 [ffff9965006a7ef0] cifs_flush at ffffffffc0a0c2fa [cifs]
>  torvalds#7 [ffff9965006a7f10] filp_close at ffffffff9b0e93f1
>  torvalds#8 [ffff9965006a7f30] __x64_sys_close at ffffffff9b0e9a0e
>  torvalds#9 [ffff9965006a7f38] do_syscall_64 at ffffffff9ae04315
>
> in __filemap_fdatawait_range
>                         wait_on_page_writeback(page);
> for the same page of the file
>
>
>
> [0 00:48:22.718] [UN]  PID: 8855   TASK: ffff8c69142745c0  CPU: 7
> COMMAND: "reopen_file"
>  #0 [ffff9965005dfc98] __schedule at ffffffff9b6e6095
>  #1 [ffff9965005dfd28] schedule at ffffffff9b6e64df
>  #2 [ffff9965005dfd38] rwsem_down_write_slowpath at ffffffff9af283d7
>  #3 [ffff9965005dfdf0] cifs_strict_writev at ffffffffc0a0c40a [cifs]
>  #4 [ffff9965005dfe48] new_sync_write at ffffffff9b0ec1dd
>  #5 [ffff9965005dfed0] vfs_write at ffffffff9b0eed35
>  torvalds#6 [ffff9965005dff00] ksys_write at ffffffff9b0eefd9
>  torvalds#7 [ffff9965005dff38] do_syscall_64 at ffffffff9ae04315
>
>         inode_lock(inode);
>
>
> and one 'ls' later on, to see whether the rest of the mount is available
> (the test file is in the root, so we get blocked up on the directory
> ->i_rwsem), so the entire mount is unavailable
>
> [0 00:36:26.473] [UN]  PID: 9802   TASK: ffff8c691436ae80  CPU: 4
> COMMAND: "ls"
>  #0 [ffff996500393d28] __schedule at ffffffff9b6e6095
>  #1 [ffff996500393db8] schedule at ffffffff9b6e64df
>  #2 [ffff996500393dc8] rwsem_down_read_slowpath at ffffffff9b6e9421
>  #3 [ffff996500393e78] down_read_killable at ffffffff9b6e95e2
>  #4 [ffff996500393e88] iterate_dir at ffffffff9b103c56
>  #5 [ffff996500393ec8] ksys_getdents64 at ffffffff9b104b0c
>  torvalds#6 [ffff996500393f30] __x64_sys_getdents64 at ffffffff9b104bb6
>  torvalds#7 [ffff996500393f38] do_syscall_64 at ffffffff9ae04315
>
> in iterate_dir:
>         if (shared)
>                 res = down_read_killable(&inode->i_rwsem);  <<<<
>         else
>                 res = down_write_killable(&inode->i_rwsem);
>

Reported-by: Frank Sorenson <[email protected]>
Reviewed-by: Pavel Shilovsky <[email protected]>
Signed-off-by: Ronnie Sahlberg <[email protected]>
Signed-off-by: Steve French <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
neilbrown pushed a commit that referenced this pull request Jan 25, 2020
commit 5c9934b upstream.

We got another syzbot report [1] that tells us we must use
write_lock_irq()/write_unlock_irq() to avoid possible deadlock.

[1]

WARNING: inconsistent lock state
5.5.0-rc1-syzkaller #0 Not tainted
--------------------------------
inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-R} usage.
syz-executor826/9605 [HC1[1]:SC0[0]:HE0:SE1] takes:
ffffffff8a128718 (disc_data_lock){+-..}, at: sp_get.isra.0+0x1d/0xf0 drivers/net/ppp/ppp_synctty.c:138
{HARDIRQ-ON-W} state was registered at:
  lock_acquire+0x190/0x410 kernel/locking/lockdep.c:4485
  __raw_write_lock_bh include/linux/rwlock_api_smp.h:203 [inline]
  _raw_write_lock_bh+0x33/0x50 kernel/locking/spinlock.c:319
  sixpack_close+0x1d/0x250 drivers/net/hamradio/6pack.c:657
  tty_ldisc_close.isra.0+0x119/0x1a0 drivers/tty/tty_ldisc.c:489
  tty_set_ldisc+0x230/0x6b0 drivers/tty/tty_ldisc.c:585
  tiocsetd drivers/tty/tty_io.c:2337 [inline]
  tty_ioctl+0xe8d/0x14f0 drivers/tty/tty_io.c:2597
  vfs_ioctl fs/ioctl.c:47 [inline]
  file_ioctl fs/ioctl.c:545 [inline]
  do_vfs_ioctl+0x977/0x14e0 fs/ioctl.c:732
  ksys_ioctl+0xab/0xd0 fs/ioctl.c:749
  __do_sys_ioctl fs/ioctl.c:756 [inline]
  __se_sys_ioctl fs/ioctl.c:754 [inline]
  __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:754
  do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
irq event stamp: 3946
hardirqs last  enabled at (3945): [<ffffffff87c86e43>] __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:168 [inline]
hardirqs last  enabled at (3945): [<ffffffff87c86e43>] _raw_spin_unlock_irq+0x23/0x80 kernel/locking/spinlock.c:199
hardirqs last disabled at (3946): [<ffffffff8100675f>] trace_hardirqs_off_thunk+0x1a/0x1c arch/x86/entry/thunk_64.S:42
softirqs last  enabled at (2658): [<ffffffff86a8b4df>] spin_unlock_bh include/linux/spinlock.h:383 [inline]
softirqs last  enabled at (2658): [<ffffffff86a8b4df>] clusterip_netdev_event+0x46f/0x670 net/ipv4/netfilter/ipt_CLUSTERIP.c:222
softirqs last disabled at (2656): [<ffffffff86a8b22b>] spin_lock_bh include/linux/spinlock.h:343 [inline]
softirqs last disabled at (2656): [<ffffffff86a8b22b>] clusterip_netdev_event+0x1bb/0x670 net/ipv4/netfilter/ipt_CLUSTERIP.c:196

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(disc_data_lock);
  <Interrupt>
    lock(disc_data_lock);

 *** DEADLOCK ***

5 locks held by syz-executor826/9605:
 #0: ffff8880a905e198 (&tty->legacy_mutex){+.+.}, at: tty_lock+0xc7/0x130 drivers/tty/tty_mutex.c:19
 #1: ffffffff899a56c0 (rcu_read_lock){....}, at: mutex_spin_on_owner+0x0/0x330 kernel/locking/mutex.c:413
 #2: ffff8880a496a2b0 (&(&i->lock)->rlock){-.-.}, at: spin_lock include/linux/spinlock.h:338 [inline]
 #2: ffff8880a496a2b0 (&(&i->lock)->rlock){-.-.}, at: serial8250_interrupt+0x2d/0x1a0 drivers/tty/serial/8250/8250_core.c:116
 #3: ffffffff8c104048 (&port_lock_key){-.-.}, at: serial8250_handle_irq.part.0+0x24/0x330 drivers/tty/serial/8250/8250_port.c:1823
 #4: ffff8880a905e090 (&tty->ldisc_sem){++++}, at: tty_ldisc_ref+0x22/0x90 drivers/tty/tty_ldisc.c:288

stack backtrace:
CPU: 1 PID: 9605 Comm: syz-executor826 Not tainted 5.5.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x197/0x210 lib/dump_stack.c:118
 print_usage_bug.cold+0x327/0x378 kernel/locking/lockdep.c:3101
 valid_state kernel/locking/lockdep.c:3112 [inline]
 mark_lock_irq kernel/locking/lockdep.c:3309 [inline]
 mark_lock+0xbb4/0x1220 kernel/locking/lockdep.c:3666
 mark_usage kernel/locking/lockdep.c:3554 [inline]
 __lock_acquire+0x1e55/0x4a00 kernel/locking/lockdep.c:3909
 lock_acquire+0x190/0x410 kernel/locking/lockdep.c:4485
 __raw_read_lock include/linux/rwlock_api_smp.h:149 [inline]
 _raw_read_lock+0x32/0x50 kernel/locking/spinlock.c:223
 sp_get.isra.0+0x1d/0xf0 drivers/net/ppp/ppp_synctty.c:138
 sixpack_write_wakeup+0x25/0x340 drivers/net/hamradio/6pack.c:402
 tty_wakeup+0xe9/0x120 drivers/tty/tty_io.c:536
 tty_port_default_wakeup+0x2b/0x40 drivers/tty/tty_port.c:50
 tty_port_tty_wakeup+0x57/0x70 drivers/tty/tty_port.c:387
 uart_write_wakeup+0x46/0x70 drivers/tty/serial/serial_core.c:104
 serial8250_tx_chars+0x495/0xaf0 drivers/tty/serial/8250/8250_port.c:1761
 serial8250_handle_irq.part.0+0x2a2/0x330 drivers/tty/serial/8250/8250_port.c:1834
 serial8250_handle_irq drivers/tty/serial/8250/8250_port.c:1820 [inline]
 serial8250_default_handle_irq+0xc0/0x150 drivers/tty/serial/8250/8250_port.c:1850
 serial8250_interrupt+0xf1/0x1a0 drivers/tty/serial/8250/8250_core.c:126
 __handle_irq_event_percpu+0x15d/0x970 kernel/irq/handle.c:149
 handle_irq_event_percpu+0x74/0x160 kernel/irq/handle.c:189
 handle_irq_event+0xa7/0x134 kernel/irq/handle.c:206
 handle_edge_irq+0x25e/0x8d0 kernel/irq/chip.c:830
 generic_handle_irq_desc include/linux/irqdesc.h:156 [inline]
 do_IRQ+0xde/0x280 arch/x86/kernel/irq.c:250
 common_interrupt+0xf/0xf arch/x86/entry/entry_64.S:607
 </IRQ>
RIP: 0010:cpu_relax arch/x86/include/asm/processor.h:685 [inline]
RIP: 0010:mutex_spin_on_owner+0x247/0x330 kernel/locking/mutex.c:579
Code: c3 be 08 00 00 00 4c 89 e7 e8 e5 06 59 00 4c 89 e0 48 c1 e8 03 42 80 3c 38 00 0f 85 e1 00 00 00 49 8b 04 24 a8 01 75 96 f3 90 <e9> 2f fe ff ff 0f 0b e8 0d 19 09 00 84 c0 0f 85 ff fd ff ff 48 c7
RSP: 0018:ffffc90001eafa20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd7
RAX: 0000000000000000 RBX: ffff88809fd9e0c0 RCX: 1ffffffff13266dd
RDX: 0000000000000000 RSI: 0000000000000008 RDI: 0000000000000000
RBP: ffffc90001eafa60 R08: 1ffff11013d22898 R09: ffffed1013d22899
R10: ffffed1013d22898 R11: ffff88809e9144c7 R12: ffff8880a905e138
R13: ffff88809e9144c0 R14: 0000000000000000 R15: dffffc0000000000
 mutex_optimistic_spin kernel/locking/mutex.c:673 [inline]
 __mutex_lock_common kernel/locking/mutex.c:962 [inline]
 __mutex_lock+0x32b/0x13c0 kernel/locking/mutex.c:1106
 mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1121
 tty_lock+0xc7/0x130 drivers/tty/tty_mutex.c:19
 tty_release+0xb5/0xe90 drivers/tty/tty_io.c:1665
 __fput+0x2ff/0x890 fs/file_table.c:280
 ____fput+0x16/0x20 fs/file_table.c:313
 task_work_run+0x145/0x1c0 kernel/task_work.c:113
 exit_task_work include/linux/task_work.h:22 [inline]
 do_exit+0x8e7/0x2ef0 kernel/exit.c:797
 do_group_exit+0x135/0x360 kernel/exit.c:895
 __do_sys_exit_group kernel/exit.c:906 [inline]
 __se_sys_exit_group kernel/exit.c:904 [inline]
 __x64_sys_exit_group+0x44/0x50 kernel/exit.c:904
 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x43fef8
Code: Bad RIP value.
RSP: 002b:00007ffdb07d2338 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000000000043fef8
RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
RBP: 00000000004bf730 R08: 00000000000000e7 R09: ffffffffffffffd0
R10: 00000000004002c8 R11: 0000000000000246 R12: 0000000000000001
R13: 00000000006d1180 R14: 0000000000000000 R15: 0000000000000000

Fixes: 6e4e2f8 ("6pack,mkiss: fix lock inconsistency")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: syzbot <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
neilbrown pushed a commit that referenced this pull request Jan 25, 2020
[ Upstream commit 5a9ef19 ]

We would not be transmitting using the correct SYSTEMPORT transmit queue
during ndo_select_queue() which looks up the internal TX ring map
because while establishing the mapping we would be off by 4, so for
instance, when we populate switch port mappings we would be doing:

switch port 0, queue 0 -> ring index #0
switch port 0, queue 1 -> ring index #1
...
switch port 0, queue 3 -> ring index #3
switch port 1, queue 0 -> ring index torvalds#8 (4 + 4 * 1)
...

instead of using ring index #4. This would cause our ndo_select_queue()
to use the fallback queue mechanism which would pick up an incorrect
ring for that switch port. Fix this by using the correct switch queue
number instead of SYSTEMPORT queue number.

Fixes: 25c4407 ("net: systemport: Simplify queue mapping logic")
Signed-off-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
neilbrown pushed a commit that referenced this pull request Jan 25, 2020
commit ba03a9b upstream.

Francois reported that VMware balloon gets stuck after a balloon reset,
when the VMCI doorbell is removed. A similar error can occur when the
balloon driver is removed with the following splat:

[ 1088.622000] INFO: task modprobe:3565 blocked for more than 120 seconds.
[ 1088.622035]       Tainted: G        W         5.2.0 #4
[ 1088.622087] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1088.622205] modprobe        D    0  3565   1450 0x00000000
[ 1088.622210] Call Trace:
[ 1088.622246]  __schedule+0x2a8/0x690
[ 1088.622248]  schedule+0x2d/0x90
[ 1088.622250]  schedule_timeout+0x1d3/0x2f0
[ 1088.622252]  wait_for_completion+0xba/0x140
[ 1088.622320]  ? wake_up_q+0x80/0x80
[ 1088.622370]  vmci_resource_remove+0xb9/0xc0 [vmw_vmci]
[ 1088.622373]  vmci_doorbell_destroy+0x9e/0xd0 [vmw_vmci]
[ 1088.622379]  vmballoon_vmci_cleanup+0x6e/0xf0 [vmw_balloon]
[ 1088.622381]  vmballoon_exit+0x18/0xcc8 [vmw_balloon]
[ 1088.622394]  __x64_sys_delete_module+0x146/0x280
[ 1088.622408]  do_syscall_64+0x5a/0x130
[ 1088.622410]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1088.622415] RIP: 0033:0x7f54f62791b7
[ 1088.622421] Code: Bad RIP value.
[ 1088.622421] RSP: 002b:00007fff2a949008 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[ 1088.622426] RAX: ffffffffffffffda RBX: 000055dff8b55d00 RCX: 00007f54f62791b7
[ 1088.622426] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000055dff8b55d68
[ 1088.622427] RBP: 000055dff8b55d00 R08: 00007fff2a947fb1 R09: 0000000000000000
[ 1088.622427] R10: 00007f54f62f5cc0 R11: 0000000000000206 R12: 000055dff8b55d68
[ 1088.622428] R13: 0000000000000001 R14: 000055dff8b55d68 R15: 00007fff2a94a3f0

The cause for the bug is that when the "delayed" doorbell is invoked, it
takes a reference on the doorbell entry and schedules work that is
supposed to run the appropriate code and drop the doorbell entry
reference. The code ignores the fact that if the work is already queued,
it will not be scheduled to run one more time. As a result one of the
references would not be dropped. When the code waits for the reference
to get to zero, during balloon reset or module removal, it gets stuck.

Fix it. Drop the reference if schedule_work() indicates that the work is
already queued.

Note that this bug got more apparent (or apparent at all) due to
commit ce66433 ("vmw_balloon: VMCI_DOORBELL_SET does not check status").

Fixes: 83e2ec7 ("VMCI: doorbell implementation.")
Reported-by: Francois Rigault <[email protected]>
Cc: Jorgen Hansen <[email protected]>
Cc: Adit Ranadive <[email protected]>
Cc: Alexios Zavras <[email protected]>
Cc: Vishnu DASA <[email protected]>
Cc: [email protected]
Signed-off-by: Nadav Amit <[email protected]>
Reviewed-by: Vishnu Dasa <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
neilbrown pushed a commit that referenced this pull request Jan 25, 2020
[ Upstream commit af8a85a ]

Calling ceph_buffer_put() in fill_inode() may result in freeing the
i_xattrs.blob buffer while holding the i_ceph_lock.  This can be fixed by
postponing the call until later, when the lock is released.

The following backtrace was triggered by fstests generic/070.

  BUG: sleeping function called from invalid context at mm/vmalloc.c:2283
  in_atomic(): 1, irqs_disabled(): 0, pid: 3852, name: kworker/0:4
  6 locks held by kworker/0:4/3852:
   #0: 000000004270f6bb ((wq_completion)ceph-msgr){+.+.}, at: process_one_work+0x1b8/0x5f0
   #1: 00000000eb420803 ((work_completion)(&(&con->work)->work)){+.+.}, at: process_one_work+0x1b8/0x5f0
   #2: 00000000be1c53a4 (&s->s_mutex){+.+.}, at: dispatch+0x288/0x1476
   #3: 00000000559cb958 (&mdsc->snap_rwsem){++++}, at: dispatch+0x2eb/0x1476
   #4: 000000000d5ebbae (&req->r_fill_mutex){+.+.}, at: dispatch+0x2fc/0x1476
   #5: 00000000a83d0514 (&(&ci->i_ceph_lock)->rlock){+.+.}, at: fill_inode.isra.0+0xf8/0xf70
  CPU: 0 PID: 3852 Comm: kworker/0:4 Not tainted 5.2.0+ torvalds#441
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014
  Workqueue: ceph-msgr ceph_con_workfn
  Call Trace:
   dump_stack+0x67/0x90
   ___might_sleep.cold+0x9f/0xb1
   vfree+0x4b/0x60
   ceph_buffer_release+0x1b/0x60
   fill_inode.isra.0+0xa9b/0xf70
   ceph_fill_trace+0x13b/0xc70
   ? dispatch+0x2eb/0x1476
   dispatch+0x320/0x1476
   ? __mutex_unlock_slowpath+0x4d/0x2a0
   ceph_con_workfn+0xc97/0x2ec0
   ? process_one_work+0x1b8/0x5f0
   process_one_work+0x244/0x5f0
   worker_thread+0x4d/0x3e0
   kthread+0x105/0x140
   ? process_one_work+0x5f0/0x5f0
   ? kthread_park+0x90/0x90
   ret_from_fork+0x3a/0x50

Signed-off-by: Luis Henriques <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
neilbrown pushed a commit that referenced this pull request Jan 25, 2020
[ Upstream commit 6ce220d ]

If stripe in batch list is set with STRIPE_HANDLE flag, then the stripe
could be set with STRIPE_ACTIVE by the handle_stripe function. And if
error happens to the batch_head at the same time, break_stripe_batch_list
is called, then below warning could happen (the same report in [1]), it
means a member of batch list was set with STRIPE_ACTIVE.

[7028915.431770] stripe state: 2001
[7028915.431815] ------------[ cut here ]------------
[7028915.431828] WARNING: CPU: 18 PID: 29089 at drivers/md/raid5.c:4614 break_stripe_batch_list+0x203/0x240 [raid456]
[...]
[7028915.431879] CPU: 18 PID: 29089 Comm: kworker/u82:5 Tainted: G           O    4.14.86-1-storage #4.14.86-1.2~deb9
[7028915.431881] Hardware name: Supermicro SSG-2028R-ACR24L/X10DRH-iT, BIOS 3.1 06/18/2018
[7028915.431888] Workqueue: raid5wq raid5_do_work [raid456]
[7028915.431890] task: ffff9ab0ef36d7c0 task.stack: ffffb72926f84000
[7028915.431896] RIP: 0010:break_stripe_batch_list+0x203/0x240 [raid456]
[7028915.431898] RSP: 0018:ffffb72926f87ba8 EFLAGS: 00010286
[7028915.431900] RAX: 0000000000000012 RBX: ffff9aaa84a98000 RCX: 0000000000000000
[7028915.431901] RDX: 0000000000000000 RSI: ffff9ab2bfa15458 RDI: ffff9ab2bfa15458
[7028915.431902] RBP: ffff9aaa8fb4e900 R08: 0000000000000001 R09: 0000000000002eb4
[7028915.431903] R10: 00000000ffffffff R11: 0000000000000000 R12: ffff9ab1736f1b00
[7028915.431904] R13: 0000000000000000 R14: ffff9aaa8fb4e900 R15: 0000000000000001
[7028915.431906] FS:  0000000000000000(0000) GS:ffff9ab2bfa00000(0000) knlGS:0000000000000000
[7028915.431907] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[7028915.431908] CR2: 00007ff953b9f5d8 CR3: 0000000bf4009002 CR4: 00000000003606e0
[7028915.431909] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[7028915.431910] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[7028915.431910] Call Trace:
[7028915.431923]  handle_stripe+0x8e7/0x2020 [raid456]
[7028915.431930]  ? __wake_up_common_lock+0x89/0xc0
[7028915.431935]  handle_active_stripes.isra.58+0x35f/0x560 [raid456]
[7028915.431939]  raid5_do_work+0xc6/0x1f0 [raid456]

Also commit 59fc630 ("RAID5: batch adjacent full stripe write")
said "If a stripe is added to batch list, then only the first stripe
of the list should be put to handle_list and run handle_stripe."

So don't set STRIPE_HANDLE to stripe which is already in batch list,
otherwise the stripe could be put to handle_list and run handle_stripe,
then the above warning could be triggered.

[1]. https://www.spinics.net/lists/raid/msg62552.html

Signed-off-by: Guoqing Jiang <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
neilbrown pushed a commit that referenced this pull request Jan 25, 2020
commit 143f6e7 upstream.

7471fb7 ("md/raid6: Fix anomily when recovering a single device in
RAID6.") avoids rereading P when it can be computed from other members.
However, this misses the chance to re-write the right data to P. This
patch sets R5_ReadError if the re-read fails.

Also, when re-read is skipped, we also missed the chance to reset
rdev->read_errors to 0. It can fail the disk when there are many read
errors on P member disk (other disks don't have read error)

V2: upper layer read request don't read parity/Q data. So there is no
need to consider such situation.

This is Reported-by: kbuild test robot <[email protected]>

Fixes: 7471fb7 ("md/raid6: Fix anomily when recovering a single device in RAID6.")
Cc: <[email protected]> #4.4+
Signed-off-by: Xiao Ni <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
neilbrown pushed a commit that referenced this pull request Apr 11, 2020
If scatter-gather operation is allowed, a large USB request is split
into multiple TRBs. For preparing TRBs for sg list, driver iterates
over the list and creates TRB for each sg and mark the chain bit to
false for the last sg. The current IOMMU driver is clubbing the list
of sgs which shares a page boundary into one and giving it to USB driver.
With this the number of sgs mapped it not equal to the the number of sgs
passed. Because of this USB driver is not marking the chain bit to false
since it couldn't iterate to the last sg. This patch addresses this issue
by marking the chain bit to false if it is the last mapped sg.

At a practical level, this patch resolves USB transfer stalls
seen with adb on dwc3 based db845c, pixel3 and other qcom
hardware after functionfs gadget added scatter-gather support
around v4.20.

Credit also to Anurag Kumar Vulisha <[email protected]>
who implemented a very similar fix to this issue.

Cc: Felipe Balbi <[email protected]>
Cc: Yang Fei <[email protected]>
Cc: Thinh Nguyen <[email protected]>
Cc: Tejas Joglekar <[email protected]>
Cc: Andrzej Pietrasiewicz <[email protected]>
Cc: Jack Pham <[email protected]>
Cc: Todd Kjos <[email protected]>
Cc: Greg KH <[email protected]>
Cc: Linux USB List <[email protected]>
Cc: stable <[email protected]> #4.20+
Signed-off-by: Pratham Pratap <[email protected]>
[jstultz: Slight tweak to remove sg_is_last() usage, reworked
          commit message, minor comment tweak]
Signed-off-by: John Stultz <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
neilbrown pushed a commit that referenced this pull request Jun 8, 2020
commit 56df70a upstream.

find_mergeable_vma() can return NULL.  In this case, it leads to a crash
when we access vm_mm(its offset is 0x40) later in write_protect_page.
And this case did happen on our server.  The following call trace is
captured in kernel 4.19 with the following patch applied and KSM zero
page enabled on our server.

  commit e86c59b ("mm/ksm: improve deduplication of zero pages with colouring")

So add a vma check to fix it.

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
  Oops: 0000 [#1] SMP NOPTI
  CPU: 9 PID: 510 Comm: ksmd Kdump: loaded Tainted: G OE 4.19.36.bsk.9-amd64 #4.19.36.bsk.9
  RIP: try_to_merge_one_page+0xc7/0x760
  Code: 24 58 65 48 33 34 25 28 00 00 00 89 e8 0f 85 a3 06 00 00 48 83 c4
        60 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 8b 46 08 a8 01 75 b8 <49>
        8b 44 24 40 4c 8d 7c 24 20 b9 07 00 00 00 4c 89 e6 4c 89 ff 48
  RSP: 0018:ffffadbdd9fffdb0 EFLAGS: 00010246
  RAX: ffffda83ffd4be08 RBX: ffffda83ffd4be40 RCX: 0000002c6e800000
  RDX: 0000000000000000 RSI: ffffda83ffd4be40 RDI: 0000000000000000
  RBP: ffffa11939f02ec0 R08: 0000000094e1a447 R09: 00000000abe76577
  R10: 0000000000000962 R11: 0000000000004e6a R12: 0000000000000000
  R13: ffffda83b1e06380 R14: ffffa18f31f072c0 R15: ffffda83ffd4be40
  FS: 0000000000000000(0000) GS:ffffa0da43b80000(0000) knlGS:0000000000000000
  CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000040 CR3: 0000002c77c0a003 CR4: 00000000007626e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  PKRU: 55555554
  Call Trace:
    ksm_scan_thread+0x115e/0x1960
    kthread+0xf5/0x130
    ret_from_fork+0x1f/0x30

[[email protected]: if the vma is out of date, just exit]
  Link: http://lkml.kernel.org/r/[email protected]
[[email protected]: add the conventional braces, replace /** with /*]
Fixes: e86c59b ("mm/ksm: improve deduplication of zero pages with colouring")
Co-developed-by: Xiongchun Duan <[email protected]>
Signed-off-by: Muchun Song <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Kirill Tkhai <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Claudio Imbrenda <[email protected]>
Cc: Markus Elfring <[email protected]>
Cc: <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
neilbrown pushed a commit that referenced this pull request Jun 8, 2020
commit 55b3209 upstream.

For skcipher algorithms, the input, output HW S/G tables
look like this: [IV, src][dst, IV]
Now, we can have 2 conditions here:
- there is no IV;
- src and dst are equal (in-place encryption) and scattered
and the error is an "off-by-one" in the HW S/G table.

This issue was seen with KASAN:
BUG: KASAN: slab-out-of-bounds in skcipher_edesc_alloc+0x95c/0x1018

Read of size 4 at addr ffff000022a02958 by task cryptomgr_test/321

CPU: 2 PID: 321 Comm: cryptomgr_test Not tainted
5.6.0-rc1-00165-ge4ef8383-dirty #4
Hardware name: LS1046A RDB Board (DT)
Call trace:
 dump_backtrace+0x0/0x260
 show_stack+0x14/0x20
 dump_stack+0xe8/0x144
 print_address_description.isra.11+0x64/0x348
 __kasan_report+0x11c/0x230
 kasan_report+0xc/0x18
 __asan_load4+0x90/0xb0
 skcipher_edesc_alloc+0x95c/0x1018
 skcipher_encrypt+0x84/0x150
 crypto_skcipher_encrypt+0x50/0x68
 test_skcipher_vec_cfg+0x4d4/0xc10
 test_skcipher_vec+0x178/0x1d8
 alg_test_skcipher+0xec/0x230
 alg_test.part.44+0x114/0x4a0
 alg_test+0x1c/0x60
 cryptomgr_test+0x34/0x58
 kthread+0x1b8/0x1c0
 ret_from_fork+0x10/0x18

Allocated by task 321:
 save_stack+0x24/0xb0
 __kasan_kmalloc.isra.10+0xc4/0xe0
 kasan_kmalloc+0xc/0x18
 __kmalloc+0x178/0x2b8
 skcipher_edesc_alloc+0x21c/0x1018
 skcipher_encrypt+0x84/0x150
 crypto_skcipher_encrypt+0x50/0x68
 test_skcipher_vec_cfg+0x4d4/0xc10
 test_skcipher_vec+0x178/0x1d8
 alg_test_skcipher+0xec/0x230
 alg_test.part.44+0x114/0x4a0
 alg_test+0x1c/0x60
 cryptomgr_test+0x34/0x58
 kthread+0x1b8/0x1c0
 ret_from_fork+0x10/0x18

Freed by task 0:
(stack is not available)

The buggy address belongs to the object at ffff000022a02800
 which belongs to the cache dma-kmalloc-512 of size 512
The buggy address is located 344 bytes inside of
 512-byte region [ffff000022a02800, ffff000022a02a00)
The buggy address belongs to the page:
page:fffffe00006a8000 refcount:1 mapcount:0 mapping:ffff00093200c400
index:0x0 compound_mapcount: 0
flags: 0xffff00000010200(slab|head)
raw: 0ffff00000010200 dead000000000100 dead000000000122 ffff00093200c400
raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff000022a02800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff000022a02880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff000022a02900: 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc
                                                    ^
 ffff000022a02980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff000022a02a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc

Fixes: 334d37c ("crypto: caam - update IV using HW support")
Cc: <[email protected]> # v5.3+
Signed-off-by: Iuliana Prodan <[email protected]>
Reviewed-by: Horia Geantă <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
neilbrown pushed a commit that referenced this pull request Jun 8, 2020
commit e84fe99 upstream.

Without CONFIG_PREEMPT, it can happen that we get soft lockups detected,
e.g., while booting up.

  watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.6.0-next-20200331+ #4
  Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
  RIP: __pageblock_pfn_to_page+0x134/0x1c0
  Call Trace:
   set_zone_contiguous+0x56/0x70
   page_alloc_init_late+0x166/0x176
   kernel_init_freeable+0xfa/0x255
   kernel_init+0xa/0x106
   ret_from_fork+0x35/0x40

The issue becomes visible when having a lot of memory (e.g., 4TB)
assigned to a single NUMA node - a system that can easily be created
using QEMU.  Inside VMs on a hypervisor with quite some memory
overcommit, this is fairly easy to trigger.

Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
Reviewed-by: Pankaj Gupta <[email protected]>
Reviewed-by: Baoquan He <[email protected]>
Reviewed-by: Shile Zhang <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Kirill Tkhai <[email protected]>
Cc: Shile Zhang <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Daniel Jordan <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Alexander Duyck <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
neilbrown pushed a commit that referenced this pull request Jun 8, 2020
commit 00e2176 upstream.

The check for the HWO flag in dwc3_gadget_ep_reclaim_trb_sg()
causes us to break out of the loop before we call
dwc3_gadget_ep_reclaim_completed_trb(), which is what likely
should be clearing the HWO flag.

This can cause odd behavior where we never reclaim all the trbs
in the sg list, so we never call giveback on a usb req, and that
will causes transfer stalls.

This effectively resovles the adb stalls seen on HiKey960
after userland changes started only using AIO in adbd.

Cc: YongQin Liu <[email protected]>
Cc: Anurag Kumar Vulisha <[email protected]>
Cc: Yang Fei <[email protected]>
Cc: Thinh Nguyen <[email protected]>
Cc: Tejas Joglekar <[email protected]>
Cc: Andrzej Pietrasiewicz <[email protected]>
Cc: Jack Pham <[email protected]>
Cc: Josh Gao <[email protected]>
Cc: Todd Kjos <[email protected]>
Cc: Felipe Balbi <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: [email protected]
Cc: [email protected] #4.20+
Signed-off-by: John Stultz <[email protected]>
Signed-off-by: Felipe Balbi <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
neilbrown pushed a commit that referenced this pull request Jun 8, 2020
[ Upstream commit 20a785a ]

This BUG halt was reported a while back, but the patch somehow got
missed:

PID: 2879   TASK: c16adaa0  CPU: 1   COMMAND: "sctpn"
 #0 [f418dd28] crash_kexec at c04a7d8c
 #1 [f418dd7c] oops_end at c0863e02
 #2 [f418dd90] do_invalid_op at c040aaca
 #3 [f418de28] error_code (via invalid_op) at c08631a5
    EAX: f34baac0  EBX: 00000090  ECX: f418deb0  EDX: f5542950  EBP: 00000000
    DS:  007b      ESI: f34ba800  ES:  007b      EDI: f418dea0  GS:  00e0
    CS:  0060      EIP: c046fa5e  ERR: ffffffff  EFLAGS: 00010286
 #4 [f418de5c] add_timer at c046fa5e
 #5 [f418de68] sctp_do_sm at f8db8c77 [sctp]
 torvalds#6 [f418df30] sctp_primitive_SHUTDOWN at f8dcc1b5 [sctp]
 torvalds#7 [f418df48] inet_shutdown at c080baf9
 torvalds#8 [f418df5c] sys_shutdown at c079eedf
 torvalds#9 [f418df7] sys_socketcall at c079fe88
    EAX: ffffffda  EBX: 0000000d  ECX: bfceea90  EDX: 0937af98
    DS:  007b      ESI: 0000000c  ES:  007b      EDI: b7150ae4
    SS:  007b      ESP: bfceea7c  EBP: bfceeaa8  GS:  0033
    CS:  0073      EIP: b775c424  ERR: 00000066  EFLAGS: 00000282

It appears that the side effect that starts the shutdown timer was processed
multiple times, which can happen as multiple paths can trigger it.  This of
course leads to the BUG halt in add_timer getting called.

Fix seems pretty straightforward, just check before the timer is added if its
already been started.  If it has mod the timer instead to min(current
expiration, new expiration)

Its been tested but not confirmed to fix the problem, as the issue has only
occured in production environments where test kernels are enjoined from being
installed.  It appears to be a sane fix to me though.  Also, recentely,
Jere found a reproducer posted on list to confirm that this resolves the
issues

Signed-off-by: Neil Horman <[email protected]>
CC: Vlad Yasevich <[email protected]>
CC: "David S. Miller" <[email protected]>
CC: [email protected]
CC: [email protected]
CC: [email protected]
Acked-by: Marcelo Ricardo Leitner <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
neilbrown pushed a commit that referenced this pull request Jun 8, 2020
commit 2b86cb8 upstream.

Be there a platform with the following layout:

      Regular NIC
       |
       +----> DSA master for switch port
               |
               +----> DSA master for another switch port

After changing DSA back to static lockdep class keys in commit
1a33e10 ("net: partially revert dynamic lockdep key changes"), this
kernel splat can be seen:

[   13.361198] ============================================
[   13.366524] WARNING: possible recursive locking detected
[   13.371851] 5.7.0-rc4-02121-gc32a05ecd7af-dirty #988 Not tainted
[   13.377874] --------------------------------------------
[   13.383201] swapper/0/0 is trying to acquire lock:
[   13.388004] ffff0000668ff298 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0
[   13.397879]
[   13.397879] but task is already holding lock:
[   13.403727] ffff0000661a1698 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0
[   13.413593]
[   13.413593] other info that might help us debug this:
[   13.420140]  Possible unsafe locking scenario:
[   13.420140]
[   13.426075]        CPU0
[   13.428523]        ----
[   13.430969]   lock(&dsa_slave_netdev_xmit_lock_key);
[   13.435946]   lock(&dsa_slave_netdev_xmit_lock_key);
[   13.440924]
[   13.440924]  *** DEADLOCK ***
[   13.440924]
[   13.446860]  May be due to missing lock nesting notation
[   13.446860]
[   13.453668] 6 locks held by swapper/0/0:
[   13.457598]  #0: ffff800010003de0 ((&idev->mc_ifc_timer)){+.-.}-{0:0}, at: call_timer_fn+0x0/0x400
[   13.466593]  #1: ffffd4d3fb478700 (rcu_read_lock){....}-{1:2}, at: mld_sendpack+0x0/0x560
[   13.474803]  #2: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: ip6_finish_output2+0x64/0xb10
[   13.483886]  #3: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x6c/0xbe0
[   13.492793]  #4: ffff0000661a1698 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0
[   13.503094]  #5: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x6c/0xbe0
[   13.512000]
[   13.512000] stack backtrace:
[   13.516369] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-rc4-02121-gc32a05ecd7af-dirty #988
[   13.530421] Call trace:
[   13.532871]  dump_backtrace+0x0/0x1d8
[   13.536539]  show_stack+0x24/0x30
[   13.539862]  dump_stack+0xe8/0x150
[   13.543271]  __lock_acquire+0x1030/0x1678
[   13.547290]  lock_acquire+0xf8/0x458
[   13.550873]  _raw_spin_lock+0x44/0x58
[   13.554543]  __dev_queue_xmit+0x84c/0xbe0
[   13.558562]  dev_queue_xmit+0x24/0x30
[   13.562232]  dsa_slave_xmit+0xe0/0x128
[   13.565988]  dev_hard_start_xmit+0xf4/0x448
[   13.570182]  __dev_queue_xmit+0x808/0xbe0
[   13.574200]  dev_queue_xmit+0x24/0x30
[   13.577869]  neigh_resolve_output+0x15c/0x220
[   13.582237]  ip6_finish_output2+0x244/0xb10
[   13.586430]  __ip6_finish_output+0x1dc/0x298
[   13.590709]  ip6_output+0x84/0x358
[   13.594116]  mld_sendpack+0x2bc/0x560
[   13.597786]  mld_ifc_timer_expire+0x210/0x390
[   13.602153]  call_timer_fn+0xcc/0x400
[   13.605822]  run_timer_softirq+0x588/0x6e0
[   13.609927]  __do_softirq+0x118/0x590
[   13.613597]  irq_exit+0x13c/0x148
[   13.616918]  __handle_domain_irq+0x6c/0xc0
[   13.621023]  gic_handle_irq+0x6c/0x160
[   13.624779]  el1_irq+0xbc/0x180
[   13.627927]  cpuidle_enter_state+0xb4/0x4d0
[   13.632120]  cpuidle_enter+0x3c/0x50
[   13.635703]  call_cpuidle+0x44/0x78
[   13.639199]  do_idle+0x228/0x2c8
[   13.642433]  cpu_startup_entry+0x2c/0x48
[   13.646363]  rest_init+0x1ac/0x280
[   13.649773]  arch_call_rest_init+0x14/0x1c
[   13.653878]  start_kernel+0x490/0x4bc

Lockdep keys themselves were added in commit ab92d68 ("net: core:
add generic lockdep keys"), and it's very likely that this splat existed
since then, but I have no real way to check, since this stacked platform
wasn't supported by mainline back then.

>From Taehee's own words:

  This patch was considered that all stackable devices have LLTX flag.
  But the dsa doesn't have LLTX, so this splat happened.
  After this patch, dsa shares the same lockdep class key.
  On the nested dsa interface architecture, which you illustrated,
  the same lockdep class key will be used in __dev_queue_xmit() because
  dsa doesn't have LLTX.
  So that lockdep detects deadlock because the same lockdep class key is
  used recursively although actually the different locks are used.
  There are some ways to fix this problem.

  1. using NETIF_F_LLTX flag.
  If possible, using the LLTX flag is a very clear way for it.
  But I'm so sorry I don't know whether the dsa could have LLTX or not.

  2. using dynamic lockdep again.
  It means that each interface uses a separate lockdep class key.
  So, lockdep will not detect recursive locking.
  But this way has a problem that it could consume lockdep class key
  too many.
  Currently, lockdep can have 8192 lockdep class keys.
   - you can see this number with the following command.
     cat /proc/lockdep_stats
     lock-classes:                         1251 [max: 8192]
     ...
     The [max: 8192] means that the maximum number of lockdep class keys.
  If too many lockdep class keys are registered, lockdep stops to work.
  So, using a dynamic(separated) lockdep class key should be considered
  carefully.
  In addition, updating lockdep class key routine might have to be existing.
  (lockdep_register_key(), lockdep_set_class(), lockdep_unregister_key())

  3. Using lockdep subclass.
  A lockdep class key could have 8 subclasses.
  The different subclass is considered different locks by lockdep
  infrastructure.
  But "lock-classes" is not counted by subclasses.
  So, it could avoid stopping lockdep infrastructure by an overflow of
  lockdep class keys.
  This approach should also have an updating lockdep class key routine.
  (lockdep_set_subclass())

  4. Using nonvalidate lockdep class key.
  The lockdep infrastructure supports nonvalidate lockdep class key type.
  It means this lockdep is not validated by lockdep infrastructure.
  So, the splat will not happen but lockdep couldn't detect real deadlock
  case because lockdep really doesn't validate it.
  I think this should be used for really special cases.
  (lockdep_set_novalidate_class())

Further discussion here:
https://patchwork.ozlabs.org/project/netdev/patch/[email protected]/

There appears to be no negative side-effect to declaring lockless TX for
the DSA virtual interfaces, which means they handle their own locking.
So that's what we do to make the splat go away.

Patch tested in a wide variety of cases: unicast, multicast, PTP, etc.

Fixes: ab92d68 ("net: core: add generic lockdep keys")
Suggested-by: Taehee Yoo <[email protected]>
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
neilbrown pushed a commit that referenced this pull request Dec 7, 2020
cl_glimpse_lock() doesn't match a lock with LDLM_FL_BL_AST
even if this lock is acquired by the same thread earlier.

It needs only size to check for spare file,
so let't add LDLM_FL_CBPENDING to match flags.

 #1 [ffff9ba7326036f0] schedule at ffffffff87b67c49
 #2 [ffff9ba732603700] obd_get_request_slot at ffffffffc0dbe0a4 [obdclass]
 #3 [ffff9ba7326037b8] ldlm_cli_enqueue at ffffffffc0faedce [ptlrpc]
 #4 [ffff9ba732603878] mdc_enqueue_send at ffffffffc11b38a8 [mdc]
 #5 [ffff9ba732603938] mdc_lock_enqueue at ffffffffc11b3eb2 [mdc]
 torvalds#6 [ffff9ba7326039a8] cl_lock_enqueue at ffffffffc0dfee95 [obdclass]
 torvalds#7 [ffff9ba7326039e0] lov_lock_enqueue at ffffffffc10ef265 [lov]
 torvalds#8 [ffff9ba732603a20] cl_lock_enqueue at ffffffffc0dfee95 [obdclass]
 torvalds#9 [ffff9ba732603a58] cl_lock_request at ffffffffc0dff54b [obdclass]

HPE-bug-id: LUS-8690
WC-bug-id: https://jira.whamcloud.com/browse/LU-13987
Lustre-commit: 829a3a93d43e ("LU-13987 ldlm: Don't re-enqueue glimpse lock on read")
Signed-off-by: Andriy Skulysh <[email protected]>
Reviewed-by: Vitaly Fertman <[email protected]>
Reviewed-by: Alexander Boyko <[email protected]>
Reviewed-by: Andrew Perepechko <[email protected]>
Reviewed-on: https://review.whamcloud.com/40044
Reviewed-by: Alexander Boyko <[email protected]>
Reviewed-by: Vitaly Fertman <[email protected]>
Reviewed-by: Oleg Drokin <[email protected]>

Signed-off-by: NeilBrown <[email protected]>
neilbrown pushed a commit that referenced this pull request Jun 12, 2022
commit 232796d upstream.

When enabling quotas, we attempt to commit a transaction while holding the
mutex fs_info->qgroup_ioctl_lock. This can result on a deadlock with other
quota operations such as:

- qgroup creation and deletion, ioctl BTRFS_IOC_QGROUP_CREATE;

- adding and removing qgroup relations, ioctl BTRFS_IOC_QGROUP_ASSIGN.

This is because these operations join a transaction and after that they
attempt to lock the mutex fs_info->qgroup_ioctl_lock. Acquiring that mutex
after joining or starting a transaction is a pattern followed everywhere
in qgroups, so the quota enablement operation is the one at fault here,
and should not commit a transaction while holding that mutex.

Fix this by making the transaction commit while not holding the mutex.
We are safe from two concurrent tasks trying to enable quotas because
we are serialized by the rw semaphore fs_info->subvol_sem at
btrfs_ioctl_quota_ctl(), which is the only call site for enabling
quotas.

When this deadlock happens, it produces a trace like the following:

  INFO: task syz-executor:25604 blocked for more than 143 seconds.
  Not tainted 5.15.0-rc6 #4
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  task:syz-executor state:D stack:24800 pid:25604 ppid: 24873 flags:0x00004004
  Call Trace:
  context_switch kernel/sched/core.c:4940 [inline]
  __schedule+0xcd9/0x2530 kernel/sched/core.c:6287
  schedule+0xd3/0x270 kernel/sched/core.c:6366
  btrfs_commit_transaction+0x994/0x2e90 fs/btrfs/transaction.c:2201
  btrfs_quota_enable+0x95c/0x1790 fs/btrfs/qgroup.c:1120
  btrfs_ioctl_quota_ctl fs/btrfs/ioctl.c:4229 [inline]
  btrfs_ioctl+0x637e/0x7b70 fs/btrfs/ioctl.c:5010
  vfs_ioctl fs/ioctl.c:51 [inline]
  __do_sys_ioctl fs/ioctl.c:874 [inline]
  __se_sys_ioctl fs/ioctl.c:860 [inline]
  __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:860
  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
  do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
  entry_SYSCALL_64_after_hwframe+0x44/0xae
  RIP: 0033:0x7f86920b2c4d
  RSP: 002b:00007f868f61ac58 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
  RAX: ffffffffffffffda RBX: 00007f86921d90a0 RCX: 00007f86920b2c4d
  RDX: 0000000020005e40 RSI: 00000000c0109428 RDI: 0000000000000008
  RBP: 00007f869212bd80 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000246 R12: 00007f86921d90a0
  R13: 00007fff6d233e4f R14: 00007fff6d233ff0 R15: 00007f868f61adc0
  INFO: task syz-executor:25628 blocked for more than 143 seconds.
  Not tainted 5.15.0-rc6 #4
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  task:syz-executor state:D stack:29080 pid:25628 ppid: 24873 flags:0x00004004
  Call Trace:
  context_switch kernel/sched/core.c:4940 [inline]
  __schedule+0xcd9/0x2530 kernel/sched/core.c:6287
  schedule+0xd3/0x270 kernel/sched/core.c:6366
  schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:6425
  __mutex_lock_common kernel/locking/mutex.c:669 [inline]
  __mutex_lock+0xc96/0x1680 kernel/locking/mutex.c:729
  btrfs_remove_qgroup+0xb7/0x7d0 fs/btrfs/qgroup.c:1548
  btrfs_ioctl_qgroup_create fs/btrfs/ioctl.c:4333 [inline]
  btrfs_ioctl+0x683c/0x7b70 fs/btrfs/ioctl.c:5014
  vfs_ioctl fs/ioctl.c:51 [inline]
  __do_sys_ioctl fs/ioctl.c:874 [inline]
  __se_sys_ioctl fs/ioctl.c:860 [inline]
  __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:860
  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
  do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
  entry_SYSCALL_64_after_hwframe+0x44/0xae

Reported-by: Hao Sun <[email protected]>
Link: https://lore.kernel.org/linux-btrfs/CACkBjsZQF19bQ1C6=yetF3BvL10OSORpFUcWXTP6HErshDB4dQ@mail.gmail.com/
Fixes: 340f1aa ("btrfs: qgroups: Move transaction management inside btrfs_quota_enable/disable")
CC: [email protected] # 4.19
Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
neilbrown pushed a commit that referenced this pull request Jun 12, 2022
commit 8b59b0a upstream.

arm32 uses software to simulate the instruction replaced
by kprobe. some instructions may be simulated by constructing
assembly functions. therefore, before executing instruction
simulation, it is necessary to construct assembly function
execution environment in C language through binding registers.
after kasan is enabled, the register binding relationship will
be destroyed, resulting in instruction simulation errors and
causing kernel panic.

the kprobe emulate instruction function is distributed in three
files: actions-common.c actions-arm.c actions-thumb.c, so disable
KASAN when compiling these files.

for example, use kprobe insert on cap_capable+20 after kasan
enabled, the cap_capable assembly code is as follows:
<cap_capable>:
e92d47f0	push	{r4, r5, r6, r7, r8, r9, sl, lr}
e1a05000	mov	r5, r0
e280006c	add	r0, r0, torvalds#108    ; 0x6c
e1a04001	mov	r4, r1
e1a06002	mov	r6, r2
e59fa090	ldr	sl, [pc, torvalds#144]  ;
ebfc7bf8	bl	c03aa4b4 <__asan_load4>
e595706c	ldr	r7, [r5, torvalds#108]  ; 0x6c
e2859014	add	r9, r5, torvalds#20
......
The emulate_ldr assembly code after enabling kasan is as follows:
c06f1384 <emulate_ldr>:
e92d47f0	push	{r4, r5, r6, r7, r8, r9, sl, lr}
e282803c	add	r8, r2, torvalds#60     ; 0x3c
e1a05000	mov	r5, r0
e7e37855	ubfx	r7, r5, torvalds#16, #4
e1a00008	mov	r0, r8
e1a09001	mov	r9, r1
e1a04002	mov	r4, r2
ebf35462	bl	c03c6530 <__asan_load4>
e357000f	cmp	r7, torvalds#15
e7e36655	ubfx	r6, r5, torvalds#12, #4
e205a00f	and	sl, r5, torvalds#15
0a000001	beq	c06f13bc <emulate_ldr+0x38>
e0840107	add	r0, r4, r7, lsl #2
ebf3545c	bl	c03c6530 <__asan_load4>
e084010a	add	r0, r4, sl, lsl #2
ebf3545a	bl	c03c6530 <__asan_load4>
e2890010	add	r0, r9, torvalds#16
ebf35458	bl	c03c6530 <__asan_load4>
e5990010	ldr	r0, [r9, torvalds#16]
e12fff30	blx	r0
e356000f	cm	r6, torvalds#15
1a000014	bne	c06f1430 <emulate_ldr+0xac>
e1a06000	mov	r6, r0
e2840040	add	r0, r4, torvalds#64     ; 0x40
......

when running in emulate_ldr to simulate the ldr instruction, panic
occurred, and the log is as follows:
Unable to handle kernel NULL pointer dereference at virtual address
00000090
pgd = ecb46400
[00000090] *pgd=2e0fa003, *pmd=00000000
Internal error: Oops: 206 [#1] SMP ARM
PC is at cap_capable+0x14/0xb0
LR is at emulate_ldr+0x50/0xc0
psr: 600d0293 sp : ecd63af8  ip : 00000004  fp : c0a7c30c
r10: 00000000  r9 : c30897f4  r8 : ecd63cd4
r7 : 0000000f  r6 : 0000000a  r5 : e59fa090  r4 : ecd63c98
r3 : c06ae294  r2 : 00000000  r1 : b7611300  r0 : bf4ec008
Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 32c5387d  Table: 2d546400  DAC: 55555555
Process bash (pid: 1643, stack limit = 0xecd60190)
(cap_capable) from (kprobe_handler+0x218/0x340)
(kprobe_handler) from (kprobe_trap_handler+0x24/0x48)
(kprobe_trap_handler) from (do_undefinstr+0x13c/0x364)
(do_undefinstr) from (__und_svc_finish+0x0/0x30)
(__und_svc_finish) from (cap_capable+0x18/0xb0)
(cap_capable) from (cap_vm_enough_memory+0x38/0x48)
(cap_vm_enough_memory) from
(security_vm_enough_memory_mm+0x48/0x6c)
(security_vm_enough_memory_mm) from
(copy_process.constprop.5+0x16b4/0x25c8)
(copy_process.constprop.5) from (_do_fork+0xe8/0x55c)
(_do_fork) from (SyS_clone+0x1c/0x24)
(SyS_clone) from (__sys_trace_return+0x0/0x10)
Code: 0050a0e1 6c0080e2 0140a0e1 0260a0e1 (f801f0e7)

Fixes: 35aa1df ("ARM kprobes: instruction single-stepping support")
Fixes: 4210157 ("ARM: 9017/2: Enable KASan for ARM")
Signed-off-by: huangshaobo <[email protected]>
Acked-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Russell King (Oracle) <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.