Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ctx.h #37

Closed
wants to merge 1 commit into from
Closed

Update ctx.h #37

wants to merge 1 commit into from

Conversation

nmtigor
Copy link

@nmtigor nmtigor commented Jun 19, 2013

No description provided.

@nmtigor nmtigor closed this Jun 19, 2013
@nmtigor nmtigor deleted the patch-1 branch June 19, 2013 21:59
torvalds pushed a commit that referenced this pull request Jul 10, 2013
…ET pending data

We accidentally call down to ip6_push_pending_frames when uncorking
pending AF_INET data on a ipv6 socket. This results in the following
splat (from Dave Jones):

skbuff: skb_under_panic: text:ffffffff816765f6 len:48 put:40 head:ffff88013deb6df0 data:ffff88013deb6dec tail:0x2c end:0xc0 dev:<NULL>
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:126!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in: dccp_ipv4 dccp 8021q garp bridge stp dlci mpoa snd_seq_dummy sctp fuse hidp tun bnep nfnetlink scsi_transport_iscsi rfcomm can_raw can_bcm af_802154 appletalk caif_socket can caif ipt_ULOG x25 rose af_key pppoe pppox ipx phonet irda llc2 ppp_generic slhc p8023 psnap p8022 llc crc_ccitt atm bluetooth
+netrom ax25 nfc rfkill rds af_rxrpc coretemp hwmon kvm_intel kvm crc32c_intel snd_hda_codec_realtek ghash_clmulni_intel microcode pcspkr snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep usb_debug snd_seq snd_seq_device snd_pcm e1000e snd_page_alloc snd_timer ptp snd pps_core soundcore xfs libcrc32c
CPU: 2 PID: 8095 Comm: trinity-child2 Not tainted 3.10.0-rc7+ #37
task: ffff8801f52c2520 ti: ffff8801e6430000 task.ti: ffff8801e6430000
RIP: 0010:[<ffffffff816e759c>]  [<ffffffff816e759c>] skb_panic+0x63/0x65
RSP: 0018:ffff8801e6431de8  EFLAGS: 00010282
RAX: 0000000000000086 RBX: ffff8802353d3cc0 RCX: 0000000000000006
RDX: 0000000000003b90 RSI: ffff8801f52c2ca0 RDI: ffff8801f52c2520
RBP: ffff8801e6431e08 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: ffff88022ea0c800
R13: ffff88022ea0cdf8 R14: ffff8802353ecb40 R15: ffffffff81cc7800
FS:  00007f5720a10740(0000) GS:ffff880244c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000005862000 CR3: 000000022843c000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Stack:
 ffff88013deb6dec 000000000000002c 00000000000000c0 ffffffff81a3f6e4
 ffff8801e6431e18 ffffffff8159a9aa ffff8801e6431e90 ffffffff816765f6
 ffffffff810b756b 0000000700000002 ffff8801e6431e40 0000fea9292aa8c0
Call Trace:
 [<ffffffff8159a9aa>] skb_push+0x3a/0x40
 [<ffffffff816765f6>] ip6_push_pending_frames+0x1f6/0x4d0
 [<ffffffff810b756b>] ? mark_held_locks+0xbb/0x140
 [<ffffffff81694919>] udp_v6_push_pending_frames+0x2b9/0x3d0
 [<ffffffff81694660>] ? udplite_getfrag+0x20/0x20
 [<ffffffff8162092a>] udp_lib_setsockopt+0x1aa/0x1f0
 [<ffffffff811cc5e7>] ? fget_light+0x387/0x4f0
 [<ffffffff816958a4>] udpv6_setsockopt+0x34/0x40
 [<ffffffff815949f4>] sock_common_setsockopt+0x14/0x20
 [<ffffffff81593c31>] SyS_setsockopt+0x71/0xd0
 [<ffffffff816f5d54>] tracesys+0xdd/0xe2
Code: 00 00 48 89 44 24 10 8b 87 d8 00 00 00 48 89 44 24 08 48 8b 87 e8 00 00 00 48 c7 c7 c0 04 aa 81 48 89 04 24 31 c0 e8 e1 7e ff ff <0f> 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55
RIP  [<ffffffff816e759c>] skb_panic+0x63/0x65
 RSP <ffff8801e6431de8>

This patch adds a check if the pending data is of address family AF_INET
and directly calls udp_push_ending_frames from udp_v6_push_pending_frames
if that is the case.

This bug was found by Dave Jones with trinity.

(Also move the initialization of fl6 below the AF_INET check, even if
not strictly necessary.)

Cc: Dave Jones <[email protected]>
Cc: YOSHIFUJI Hideaki <[email protected]>
Signed-off-by: Hannes Frederic Sowa <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
hno pushed a commit to hno/linux that referenced this pull request Jul 23, 2013
Running one program that continuously hotplugs and replugs a cpu
concurrently with another program that continuously writes to the
scaling_setspeed node eventually deadlocks with:

=============================================
[ INFO: possible recursive locking detected ]
3.4.0 torvalds#37 Tainted: G        W
---------------------------------------------
filemonkey/122 is trying to acquire lock:
 (s_active#13){++++.+}, at: [<c01a3d28>] sysfs_remove_dir+0x9c/0xb4

but task is already holding lock:
 (s_active#13){++++.+}, at: [<c01a22f0>] sysfs_write_file+0xe8/0x140

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(s_active#13);
  lock(s_active#13);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

2 locks held by filemonkey/122:
 #0:  (&buffer->mutex){+.+.+.}, at: [<c01a2230>] sysfs_write_file+0x28/0x140
 #1:  (s_active#13){++++.+}, at: [<c01a22f0>] sysfs_write_file+0xe8/0x140

stack backtrace:
[<c0014fcc>] (unwind_backtrace+0x0/0x120) from [<c00ca600>] (validate_chain+0x6f8/0x1054)
[<c00ca600>] (validate_chain+0x6f8/0x1054) from [<c00cb778>] (__lock_acquire+0x81c/0x8d8)
[<c00cb778>] (__lock_acquire+0x81c/0x8d8) from [<c00cb9c0>] (lock_acquire+0x18c/0x1e8)
[<c00cb9c0>] (lock_acquire+0x18c/0x1e8) from [<c01a3ba8>] (sysfs_addrm_finish+0xd0/0x180)
[<c01a3ba8>] (sysfs_addrm_finish+0xd0/0x180) from [<c01a3d28>] (sysfs_remove_dir+0x9c/0xb4)
[<c01a3d28>] (sysfs_remove_dir+0x9c/0xb4) from [<c02d0e5c>] (kobject_del+0x10/0x38)
[<c02d0e5c>] (kobject_del+0x10/0x38) from [<c02d0f74>] (kobject_release+0xf0/0x194)
[<c02d0f74>] (kobject_release+0xf0/0x194) from [<c0565a98>] (cpufreq_cpu_put+0xc/0x24)
[<c0565a98>] (cpufreq_cpu_put+0xc/0x24) from [<c05683f0>] (store+0x6c/0x74)
[<c05683f0>] (store+0x6c/0x74) from [<c01a2314>] (sysfs_write_file+0x10c/0x140)
[<c01a2314>] (sysfs_write_file+0x10c/0x140) from [<c014af44>] (vfs_write+0xb0/0x128)
[<c014af44>] (vfs_write+0xb0/0x128) from [<c014b06c>] (sys_write+0x3c/0x68)
[<c014b06c>] (sys_write+0x3c/0x68) from [<c000e0e0>] (ret_fast_syscall+0x0/0x3c)

This is because store() in cpufreq.c indirectly calls
kobject_get() via cpufreq_cpu_get() and is the last one to call
kobject_put() via cpufreq_cpu_put(). Sysfs code should not call
kobject_get() or kobject_put() directly (see the comment around
sysfs_schedule_callback() for more information).

Fix this deadlock by introducing two new functions:

	struct cpufreq_policy *cpufreq_cpu_get_sysfs(unsigned int cpu)
	void cpufreq_cpu_put_sysfs(struct cpufreq_policy *data)

which do the same thing as cpufreq_cpu_{get,put}() but don't call
kobject functions.

To easily trigger this deadlock you can insert an msleep() with a
reasonably large value right after the fail label at the bottom
of the store() function in cpufreq.c and then write
scaling_setspeed in one task and offline the cpu in another. The
first task will hang and be detected by the hung task detector.

Signed-off-by: Stephen Boyd <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
tositrino pushed a commit to tositrino/linux that referenced this pull request Jul 29, 2013
…ET pending data

[ Upstream commit 8822b64 ]

We accidentally call down to ip6_push_pending_frames when uncorking
pending AF_INET data on a ipv6 socket. This results in the following
splat (from Dave Jones):

skbuff: skb_under_panic: text:ffffffff816765f6 len:48 put:40 head:ffff88013deb6df0 data:ffff88013deb6dec tail:0x2c end:0xc0 dev:<NULL>
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:126!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in: dccp_ipv4 dccp 8021q garp bridge stp dlci mpoa snd_seq_dummy sctp fuse hidp tun bnep nfnetlink scsi_transport_iscsi rfcomm can_raw can_bcm af_802154 appletalk caif_socket can caif ipt_ULOG x25 rose af_key pppoe pppox ipx phonet irda llc2 ppp_generic slhc p8023 psnap p8022 llc crc_ccitt atm bluetooth
+netrom ax25 nfc rfkill rds af_rxrpc coretemp hwmon kvm_intel kvm crc32c_intel snd_hda_codec_realtek ghash_clmulni_intel microcode pcspkr snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep usb_debug snd_seq snd_seq_device snd_pcm e1000e snd_page_alloc snd_timer ptp snd pps_core soundcore xfs libcrc32c
CPU: 2 PID: 8095 Comm: trinity-child2 Not tainted 3.10.0-rc7+ torvalds#37
task: ffff8801f52c2520 ti: ffff8801e6430000 task.ti: ffff8801e6430000
RIP: 0010:[<ffffffff816e759c>]  [<ffffffff816e759c>] skb_panic+0x63/0x65
RSP: 0018:ffff8801e6431de8  EFLAGS: 00010282
RAX: 0000000000000086 RBX: ffff8802353d3cc0 RCX: 0000000000000006
RDX: 0000000000003b90 RSI: ffff8801f52c2ca0 RDI: ffff8801f52c2520
RBP: ffff8801e6431e08 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: ffff88022ea0c800
R13: ffff88022ea0cdf8 R14: ffff8802353ecb40 R15: ffffffff81cc7800
FS:  00007f5720a10740(0000) GS:ffff880244c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000005862000 CR3: 000000022843c000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Stack:
 ffff88013deb6dec 000000000000002c 00000000000000c0 ffffffff81a3f6e4
 ffff8801e6431e18 ffffffff8159a9aa ffff8801e6431e90 ffffffff816765f6
 ffffffff810b756b 0000000700000002 ffff8801e6431e40 0000fea9292aa8c0
Call Trace:
 [<ffffffff8159a9aa>] skb_push+0x3a/0x40
 [<ffffffff816765f6>] ip6_push_pending_frames+0x1f6/0x4d0
 [<ffffffff810b756b>] ? mark_held_locks+0xbb/0x140
 [<ffffffff81694919>] udp_v6_push_pending_frames+0x2b9/0x3d0
 [<ffffffff81694660>] ? udplite_getfrag+0x20/0x20
 [<ffffffff8162092a>] udp_lib_setsockopt+0x1aa/0x1f0
 [<ffffffff811cc5e7>] ? fget_light+0x387/0x4f0
 [<ffffffff816958a4>] udpv6_setsockopt+0x34/0x40
 [<ffffffff815949f4>] sock_common_setsockopt+0x14/0x20
 [<ffffffff81593c31>] SyS_setsockopt+0x71/0xd0
 [<ffffffff816f5d54>] tracesys+0xdd/0xe2
Code: 00 00 48 89 44 24 10 8b 87 d8 00 00 00 48 89 44 24 08 48 8b 87 e8 00 00 00 48 c7 c7 c0 04 aa 81 48 89 04 24 31 c0 e8 e1 7e ff ff <0f> 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55
RIP  [<ffffffff816e759c>] skb_panic+0x63/0x65
 RSP <ffff8801e6431de8>

This patch adds a check if the pending data is of address family AF_INET
and directly calls udp_push_ending_frames from udp_v6_push_pending_frames
if that is the case.

This bug was found by Dave Jones with trinity.

(Also move the initialization of fl6 below the AF_INET check, even if
not strictly necessary.)

Signed-off-by: Hannes Frederic Sowa <[email protected]>
Cc: Dave Jones <[email protected]>
Cc: YOSHIFUJI Hideaki <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
heftig referenced this pull request in zen-kernel/zen-kernel Jul 29, 2013
…ET pending data

[ Upstream commit 8822b64 ]

We accidentally call down to ip6_push_pending_frames when uncorking
pending AF_INET data on a ipv6 socket. This results in the following
splat (from Dave Jones):

skbuff: skb_under_panic: text:ffffffff816765f6 len:48 put:40 head:ffff88013deb6df0 data:ffff88013deb6dec tail:0x2c end:0xc0 dev:<NULL>
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:126!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in: dccp_ipv4 dccp 8021q garp bridge stp dlci mpoa snd_seq_dummy sctp fuse hidp tun bnep nfnetlink scsi_transport_iscsi rfcomm can_raw can_bcm af_802154 appletalk caif_socket can caif ipt_ULOG x25 rose af_key pppoe pppox ipx phonet irda llc2 ppp_generic slhc p8023 psnap p8022 llc crc_ccitt atm bluetooth
+netrom ax25 nfc rfkill rds af_rxrpc coretemp hwmon kvm_intel kvm crc32c_intel snd_hda_codec_realtek ghash_clmulni_intel microcode pcspkr snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep usb_debug snd_seq snd_seq_device snd_pcm e1000e snd_page_alloc snd_timer ptp snd pps_core soundcore xfs libcrc32c
CPU: 2 PID: 8095 Comm: trinity-child2 Not tainted 3.10.0-rc7+ #37
task: ffff8801f52c2520 ti: ffff8801e6430000 task.ti: ffff8801e6430000
RIP: 0010:[<ffffffff816e759c>]  [<ffffffff816e759c>] skb_panic+0x63/0x65
RSP: 0018:ffff8801e6431de8  EFLAGS: 00010282
RAX: 0000000000000086 RBX: ffff8802353d3cc0 RCX: 0000000000000006
RDX: 0000000000003b90 RSI: ffff8801f52c2ca0 RDI: ffff8801f52c2520
RBP: ffff8801e6431e08 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: ffff88022ea0c800
R13: ffff88022ea0cdf8 R14: ffff8802353ecb40 R15: ffffffff81cc7800
FS:  00007f5720a10740(0000) GS:ffff880244c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000005862000 CR3: 000000022843c000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Stack:
 ffff88013deb6dec 000000000000002c 00000000000000c0 ffffffff81a3f6e4
 ffff8801e6431e18 ffffffff8159a9aa ffff8801e6431e90 ffffffff816765f6
 ffffffff810b756b 0000000700000002 ffff8801e6431e40 0000fea9292aa8c0
Call Trace:
 [<ffffffff8159a9aa>] skb_push+0x3a/0x40
 [<ffffffff816765f6>] ip6_push_pending_frames+0x1f6/0x4d0
 [<ffffffff810b756b>] ? mark_held_locks+0xbb/0x140
 [<ffffffff81694919>] udp_v6_push_pending_frames+0x2b9/0x3d0
 [<ffffffff81694660>] ? udplite_getfrag+0x20/0x20
 [<ffffffff8162092a>] udp_lib_setsockopt+0x1aa/0x1f0
 [<ffffffff811cc5e7>] ? fget_light+0x387/0x4f0
 [<ffffffff816958a4>] udpv6_setsockopt+0x34/0x40
 [<ffffffff815949f4>] sock_common_setsockopt+0x14/0x20
 [<ffffffff81593c31>] SyS_setsockopt+0x71/0xd0
 [<ffffffff816f5d54>] tracesys+0xdd/0xe2
Code: 00 00 48 89 44 24 10 8b 87 d8 00 00 00 48 89 44 24 08 48 8b 87 e8 00 00 00 48 c7 c7 c0 04 aa 81 48 89 04 24 31 c0 e8 e1 7e ff ff <0f> 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55
RIP  [<ffffffff816e759c>] skb_panic+0x63/0x65
 RSP <ffff8801e6431de8>

This patch adds a check if the pending data is of address family AF_INET
and directly calls udp_push_ending_frames from udp_v6_push_pending_frames
if that is the case.

This bug was found by Dave Jones with trinity.

(Also move the initialization of fl6 below the AF_INET check, even if
not strictly necessary.)

Signed-off-by: Hannes Frederic Sowa <[email protected]>
Cc: Dave Jones <[email protected]>
Cc: YOSHIFUJI Hideaki <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
baerwolf pushed a commit to baerwolf/linux-stephan that referenced this pull request Aug 3, 2013
…ET pending data

[ Upstream commit 8822b64 ]

We accidentally call down to ip6_push_pending_frames when uncorking
pending AF_INET data on a ipv6 socket. This results in the following
splat (from Dave Jones):

skbuff: skb_under_panic: text:ffffffff816765f6 len:48 put:40 head:ffff88013deb6df0 data:ffff88013deb6dec tail:0x2c end:0xc0 dev:<NULL>
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:126!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in: dccp_ipv4 dccp 8021q garp bridge stp dlci mpoa snd_seq_dummy sctp fuse hidp tun bnep nfnetlink scsi_transport_iscsi rfcomm can_raw can_bcm af_802154 appletalk caif_socket can caif ipt_ULOG x25 rose af_key pppoe pppox ipx phonet irda llc2 ppp_generic slhc p8023 psnap p8022 llc crc_ccitt atm bluetooth
+netrom ax25 nfc rfkill rds af_rxrpc coretemp hwmon kvm_intel kvm crc32c_intel snd_hda_codec_realtek ghash_clmulni_intel microcode pcspkr snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep usb_debug snd_seq snd_seq_device snd_pcm e1000e snd_page_alloc snd_timer ptp snd pps_core soundcore xfs libcrc32c
CPU: 2 PID: 8095 Comm: trinity-child2 Not tainted 3.10.0-rc7+ torvalds#37
task: ffff8801f52c2520 ti: ffff8801e6430000 task.ti: ffff8801e6430000
RIP: 0010:[<ffffffff816e759c>]  [<ffffffff816e759c>] skb_panic+0x63/0x65
RSP: 0018:ffff8801e6431de8  EFLAGS: 00010282
RAX: 0000000000000086 RBX: ffff8802353d3cc0 RCX: 0000000000000006
RDX: 0000000000003b90 RSI: ffff8801f52c2ca0 RDI: ffff8801f52c2520
RBP: ffff8801e6431e08 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: ffff88022ea0c800
R13: ffff88022ea0cdf8 R14: ffff8802353ecb40 R15: ffffffff81cc7800
FS:  00007f5720a10740(0000) GS:ffff880244c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000005862000 CR3: 000000022843c000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Stack:
 ffff88013deb6dec 000000000000002c 00000000000000c0 ffffffff81a3f6e4
 ffff8801e6431e18 ffffffff8159a9aa ffff8801e6431e90 ffffffff816765f6
 ffffffff810b756b 0000000700000002 ffff8801e6431e40 0000fea9292aa8c0
Call Trace:
 [<ffffffff8159a9aa>] skb_push+0x3a/0x40
 [<ffffffff816765f6>] ip6_push_pending_frames+0x1f6/0x4d0
 [<ffffffff810b756b>] ? mark_held_locks+0xbb/0x140
 [<ffffffff81694919>] udp_v6_push_pending_frames+0x2b9/0x3d0
 [<ffffffff81694660>] ? udplite_getfrag+0x20/0x20
 [<ffffffff8162092a>] udp_lib_setsockopt+0x1aa/0x1f0
 [<ffffffff811cc5e7>] ? fget_light+0x387/0x4f0
 [<ffffffff816958a4>] udpv6_setsockopt+0x34/0x40
 [<ffffffff815949f4>] sock_common_setsockopt+0x14/0x20
 [<ffffffff81593c31>] SyS_setsockopt+0x71/0xd0
 [<ffffffff816f5d54>] tracesys+0xdd/0xe2
Code: 00 00 48 89 44 24 10 8b 87 d8 00 00 00 48 89 44 24 08 48 8b 87 e8 00 00 00 48 c7 c7 c0 04 aa 81 48 89 04 24 31 c0 e8 e1 7e ff ff <0f> 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55
RIP  [<ffffffff816e759c>] skb_panic+0x63/0x65
 RSP <ffff8801e6431de8>

This patch adds a check if the pending data is of address family AF_INET
and directly calls udp_push_ending_frames from udp_v6_push_pending_frames
if that is the case.

This bug was found by Dave Jones with trinity.

(Also move the initialization of fl6 below the AF_INET check, even if
not strictly necessary.)

Cc: Dave Jones <[email protected]>
Cc: YOSHIFUJI Hideaki <[email protected]>
Signed-off-by: Hannes Frederic Sowa <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
mdrjr referenced this pull request in hardkernel/linux Aug 6, 2013
…ET pending data

[ Upstream commit 8822b64 ]

We accidentally call down to ip6_push_pending_frames when uncorking
pending AF_INET data on a ipv6 socket. This results in the following
splat (from Dave Jones):

skbuff: skb_under_panic: text:ffffffff816765f6 len:48 put:40 head:ffff88013deb6df0 data:ffff88013deb6dec tail:0x2c end:0xc0 dev:<NULL>
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:126!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in: dccp_ipv4 dccp 8021q garp bridge stp dlci mpoa snd_seq_dummy sctp fuse hidp tun bnep nfnetlink scsi_transport_iscsi rfcomm can_raw can_bcm af_802154 appletalk caif_socket can caif ipt_ULOG x25 rose af_key pppoe pppox ipx phonet irda llc2 ppp_generic slhc p8023 psnap p8022 llc crc_ccitt atm bluetooth
+netrom ax25 nfc rfkill rds af_rxrpc coretemp hwmon kvm_intel kvm crc32c_intel snd_hda_codec_realtek ghash_clmulni_intel microcode pcspkr snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep usb_debug snd_seq snd_seq_device snd_pcm e1000e snd_page_alloc snd_timer ptp snd pps_core soundcore xfs libcrc32c
CPU: 2 PID: 8095 Comm: trinity-child2 Not tainted 3.10.0-rc7+ #37
task: ffff8801f52c2520 ti: ffff8801e6430000 task.ti: ffff8801e6430000
RIP: 0010:[<ffffffff816e759c>]  [<ffffffff816e759c>] skb_panic+0x63/0x65
RSP: 0018:ffff8801e6431de8  EFLAGS: 00010282
RAX: 0000000000000086 RBX: ffff8802353d3cc0 RCX: 0000000000000006
RDX: 0000000000003b90 RSI: ffff8801f52c2ca0 RDI: ffff8801f52c2520
RBP: ffff8801e6431e08 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: ffff88022ea0c800
R13: ffff88022ea0cdf8 R14: ffff8802353ecb40 R15: ffffffff81cc7800
FS:  00007f5720a10740(0000) GS:ffff880244c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000005862000 CR3: 000000022843c000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Stack:
 ffff88013deb6dec 000000000000002c 00000000000000c0 ffffffff81a3f6e4
 ffff8801e6431e18 ffffffff8159a9aa ffff8801e6431e90 ffffffff816765f6
 ffffffff810b756b 0000000700000002 ffff8801e6431e40 0000fea9292aa8c0
Call Trace:
 [<ffffffff8159a9aa>] skb_push+0x3a/0x40
 [<ffffffff816765f6>] ip6_push_pending_frames+0x1f6/0x4d0
 [<ffffffff810b756b>] ? mark_held_locks+0xbb/0x140
 [<ffffffff81694919>] udp_v6_push_pending_frames+0x2b9/0x3d0
 [<ffffffff81694660>] ? udplite_getfrag+0x20/0x20
 [<ffffffff8162092a>] udp_lib_setsockopt+0x1aa/0x1f0
 [<ffffffff811cc5e7>] ? fget_light+0x387/0x4f0
 [<ffffffff816958a4>] udpv6_setsockopt+0x34/0x40
 [<ffffffff815949f4>] sock_common_setsockopt+0x14/0x20
 [<ffffffff81593c31>] SyS_setsockopt+0x71/0xd0
 [<ffffffff816f5d54>] tracesys+0xdd/0xe2
Code: 00 00 48 89 44 24 10 8b 87 d8 00 00 00 48 89 44 24 08 48 8b 87 e8 00 00 00 48 c7 c7 c0 04 aa 81 48 89 04 24 31 c0 e8 e1 7e ff ff <0f> 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55
RIP  [<ffffffff816e759c>] skb_panic+0x63/0x65
 RSP <ffff8801e6431de8>

This patch adds a check if the pending data is of address family AF_INET
and directly calls udp_push_ending_frames from udp_v6_push_pending_frames
if that is the case.

This bug was found by Dave Jones with trinity.

(Also move the initialization of fl6 below the AF_INET check, even if
not strictly necessary.)

Cc: Dave Jones <[email protected]>
Cc: YOSHIFUJI Hideaki <[email protected]>
Signed-off-by: Hannes Frederic Sowa <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
swarren pushed a commit to swarren/linux-tegra that referenced this pull request Oct 14, 2013
As the new x86 CPU bootup printout format code maintainer, I am
taking immediate action to improve and clean (and thus indulge
my OCD) the reporting of the cores when coming up online.

Fix padding to a right-hand alignment, cleanup code and bind
reporting width to the max number of supported CPUs on the
system, like this:

 [    0.074509] smpboot: Booting Node   0, Processors:      #1  #2  #3  #4  #5  torvalds#6  torvalds#7 OK
 [    0.644008] smpboot: Booting Node   1, Processors:  torvalds#8  torvalds#9 torvalds#10 torvalds#11 torvalds#12 torvalds#13 torvalds#14 torvalds#15 OK
 [    1.245006] smpboot: Booting Node   2, Processors: torvalds#16 torvalds#17 torvalds#18 torvalds#19 torvalds#20 torvalds#21 torvalds#22 torvalds#23 OK
 [    1.864005] smpboot: Booting Node   3, Processors: torvalds#24 torvalds#25 torvalds#26 torvalds#27 torvalds#28 torvalds#29 torvalds#30 torvalds#31 OK
 [    2.489005] smpboot: Booting Node   4, Processors: torvalds#32 torvalds#33 torvalds#34 torvalds#35 torvalds#36 torvalds#37 torvalds#38 torvalds#39 OK
 [    3.093005] smpboot: Booting Node   5, Processors: torvalds#40 torvalds#41 torvalds#42 torvalds#43 torvalds#44 torvalds#45 torvalds#46 torvalds#47 OK
 [    3.698005] smpboot: Booting Node   6, Processors: torvalds#48 torvalds#49 torvalds#50 torvalds#51 #52 #53 torvalds#54 torvalds#55 OK
 [    4.304005] smpboot: Booting Node   7, Processors: torvalds#56 torvalds#57 #58 torvalds#59 torvalds#60 torvalds#61 torvalds#62 torvalds#63 OK
 [    4.961413] Brought up 64 CPUs

and this:

 [    0.072367] smpboot: Booting Node   0, Processors:    #1 #2 #3 #4 #5 torvalds#6 torvalds#7 OK
 [    0.686329] Brought up 8 CPUs

Signed-off-by: Borislav Petkov <[email protected]>
Cc: Libin <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
swarren pushed a commit to swarren/linux-tegra that referenced this pull request Oct 14, 2013
Turn it into (for example):

[    0.073380] x86: Booting SMP configuration:
[    0.074005] .... node   #0, CPUs:          #1   #2   #3   #4   #5   torvalds#6   torvalds#7
[    0.603005] .... node   #1, CPUs:     torvalds#8   torvalds#9  torvalds#10  torvalds#11  torvalds#12  torvalds#13  torvalds#14  torvalds#15
[    1.200005] .... node   #2, CPUs:    torvalds#16  torvalds#17  torvalds#18  torvalds#19  torvalds#20  torvalds#21  torvalds#22  torvalds#23
[    1.796005] .... node   #3, CPUs:    torvalds#24  torvalds#25  torvalds#26  torvalds#27  torvalds#28  torvalds#29  torvalds#30  torvalds#31
[    2.393005] .... node   #4, CPUs:    torvalds#32  torvalds#33  torvalds#34  torvalds#35  torvalds#36  torvalds#37  torvalds#38  torvalds#39
[    2.996005] .... node   #5, CPUs:    torvalds#40  torvalds#41  torvalds#42  torvalds#43  torvalds#44  torvalds#45  torvalds#46  torvalds#47
[    3.600005] .... node   torvalds#6, CPUs:    torvalds#48  torvalds#49  torvalds#50  torvalds#51  #52  #53  torvalds#54  torvalds#55
[    4.202005] .... node   torvalds#7, CPUs:    torvalds#56  torvalds#57  #58  torvalds#59  torvalds#60  torvalds#61  torvalds#62  torvalds#63
[    4.811005] .... node   torvalds#8, CPUs:    torvalds#64  torvalds#65  torvalds#66  torvalds#67  torvalds#68  torvalds#69  #70  torvalds#71
[    5.421006] .... node   torvalds#9, CPUs:    torvalds#72  torvalds#73  torvalds#74  torvalds#75  torvalds#76  torvalds#77  torvalds#78  torvalds#79
[    6.032005] .... node  torvalds#10, CPUs:    torvalds#80  torvalds#81  torvalds#82  torvalds#83  torvalds#84  torvalds#85  torvalds#86  torvalds#87
[    6.648006] .... node  torvalds#11, CPUs:    torvalds#88  torvalds#89  torvalds#90  torvalds#91  torvalds#92  torvalds#93  torvalds#94  torvalds#95
[    7.262005] .... node  torvalds#12, CPUs:    torvalds#96  torvalds#97  torvalds#98  torvalds#99 torvalds#100 torvalds#101 torvalds#102 torvalds#103
[    7.865005] .... node  torvalds#13, CPUs:   torvalds#104 torvalds#105 torvalds#106 torvalds#107 torvalds#108 torvalds#109 torvalds#110 torvalds#111
[    8.466005] .... node  torvalds#14, CPUs:   torvalds#112 torvalds#113 torvalds#114 torvalds#115 torvalds#116 torvalds#117 torvalds#118 torvalds#119
[    9.073006] .... node  torvalds#15, CPUs:   torvalds#120 torvalds#121 torvalds#122 torvalds#123 torvalds#124 torvalds#125 torvalds#126 torvalds#127
[    9.679901] x86: Booted up 16 nodes, 128 CPUs

and drop useless elements.

Change num_digits() to hpa's division-avoiding, cell-phone-typed
version which he went at great lengths and pains to submit on a
Saturday evening.

Signed-off-by: Borislav Petkov <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Linus Torvalds <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
neilbrown pushed a commit to neilbrown/linux that referenced this pull request Oct 24, 2013
When operate harddisk and hit errors, md_set_badblocks is called after
scsi_restart_operations which already disabled the irq. but md_set_badblocks
will call write_sequnlock_irq and enable irq. so softirq can preempt the
current thread and that may cause a deadlock. I think this situation should
use write_sequnlock_irqsave/irqrestore instead.

I met the situation and the call trace is below:
[  638.919974] BUG: spinlock recursion on CPU#0, scsi_eh_13/1010
[  638.921923]  lock: 0xffff8800d4d51fc8, .magic: dead4ead, .owner: scsi_eh_13/1010, .owner_cpu: 0
[  638.923890] CPU: 0 PID: 1010 Comm: scsi_eh_13 Not tainted 3.12.0-rc5+ torvalds#37
[  638.925844] Hardware name: To be filled by O.E.M. To be filled by O.E.M./MAHOBAY, BIOS 4.6.5 03/05/2013
[  638.927816]  ffff880037ad4640 ffff880118c03d50 ffffffff8172ff85 0000000000000007
[  638.929829]  ffff8800d4d51fc8 ffff880118c03d70 ffffffff81730030 ffff8800d4d51fc8
[  638.931848]  ffffffff81a72eb0 ffff880118c03d90 ffffffff81730056 ffff8800d4d51fc8
[  638.933884] Call Trace:
[  638.935867]  <IRQ>  [<ffffffff8172ff85>] dump_stack+0x55/0x76
[  638.937878]  [<ffffffff81730030>] spin_dump+0x8a/0x8f
[  638.939861]  [<ffffffff81730056>] spin_bug+0x21/0x26
[  638.941836]  [<ffffffff81336de4>] do_raw_spin_lock+0xa4/0xc0
[  638.943801]  [<ffffffff8173f036>] _raw_spin_lock+0x66/0x80
[  638.945747]  [<ffffffff814a73ed>] ? scsi_device_unbusy+0x9d/0xd0
[  638.947672]  [<ffffffff8173fb1b>] ? _raw_spin_unlock+0x2b/0x50
[  638.949595]  [<ffffffff814a73ed>] scsi_device_unbusy+0x9d/0xd0
[  638.951504]  [<ffffffff8149ec47>] scsi_finish_command+0x37/0xe0
[  638.953388]  [<ffffffff814a75e8>] scsi_softirq_done+0xa8/0x140
[  638.955248]  [<ffffffff8130e32b>] blk_done_softirq+0x7b/0x90
[  638.957116]  [<ffffffff8104fddd>] __do_softirq+0xfd/0x330
[  638.958987]  [<ffffffff810b964f>] ? __lock_release+0x6f/0x100
[  638.960861]  [<ffffffff8174a5cc>] call_softirq+0x1c/0x30
[  638.962724]  [<ffffffff81004c7d>] do_softirq+0x8d/0xc0
[  638.964565]  [<ffffffff8105024e>] irq_exit+0x10e/0x150
[  638.966390]  [<ffffffff8174ad4a>] smp_apic_timer_interrupt+0x4a/0x60
[  638.968223]  [<ffffffff817499af>] apic_timer_interrupt+0x6f/0x80
[  638.970079]  <EOI>  [<ffffffff810b964f>] ? __lock_release+0x6f/0x100
[  638.971899]  [<ffffffff8173fa6a>] ? _raw_spin_unlock_irq+0x3a/0x50
[  638.973691]  [<ffffffff8173fa60>] ? _raw_spin_unlock_irq+0x30/0x50
[  638.975475]  [<ffffffff81562393>] md_set_badblocks+0x1f3/0x4a0
[  638.977243]  [<ffffffff81566e07>] rdev_set_badblocks+0x27/0x80
[  638.978988]  [<ffffffffa00d97bb>] raid5_end_read_request+0x36b/0x4e0 [raid456]
[  638.980723]  [<ffffffff811b5a1d>] bio_endio+0x1d/0x40
[  638.982463]  [<ffffffff81304ff3>] req_bio_endio.isra.65+0x83/0xa0
[  638.984214]  [<ffffffff81306b9f>] blk_update_request+0x7f/0x350
[  638.985967]  [<ffffffff81306ea1>] blk_update_bidi_request+0x31/0x90
[  638.987710]  [<ffffffff813085e0>] __blk_end_bidi_request+0x20/0x50
[  638.989439]  [<ffffffff8130862f>] __blk_end_request_all+0x1f/0x30
[  638.991149]  [<ffffffff81308746>] blk_peek_request+0x106/0x250
[  638.992861]  [<ffffffff814a62a9>] ? scsi_kill_request.isra.32+0xe9/0x130
[  638.994561]  [<ffffffff814a633a>] scsi_request_fn+0x4a/0x3d0
[  638.996251]  [<ffffffff813040a7>] __blk_run_queue+0x37/0x50
[  638.997900]  [<ffffffff813045af>] blk_run_queue+0x2f/0x50
[  638.999553]  [<ffffffff814a5750>] scsi_run_queue+0xe0/0x1c0
[  639.001185]  [<ffffffff814a7721>] scsi_run_host_queues+0x21/0x40
[  639.002798]  [<ffffffff814a2e87>] scsi_restart_operations+0x177/0x200
[  639.004391]  [<ffffffff814a4fe9>] scsi_error_handler+0xc9/0xe0
[  639.005996]  [<ffffffff814a4f20>] ? scsi_unjam_host+0xd0/0xd0
[  639.007600]  [<ffffffff81072f6b>] kthread+0xdb/0xe0
[  639.009205]  [<ffffffff81072e90>] ? flush_kthread_worker+0x170/0x170
[  639.010821]  [<ffffffff81748cac>] ret_from_fork+0x7c/0xb0
[  639.012437]  [<ffffffff81072e90>] ? flush_kthread_worker+0x170/0x170

This bug was introduce in commit  2e8ac30
(the first time rdev_set_badblock was call from interrupt context),
so this patch is appropriate for 3.5 and subsequent kernels.

Cc: <[email protected]> (3.5+)
Signed-off-by: Bian Yu <[email protected]>
Reviewed-by: Jianpeng Ma <[email protected]>
Signed-off-by: NeilBrown <[email protected]>
kees pushed a commit to kees/linux that referenced this pull request Nov 8, 2013
…ET pending data

BugLink: http://bugs.launchpad.net/bugs/1214984

[ Upstream commit 8822b64 ]

We accidentally call down to ip6_push_pending_frames when uncorking
pending AF_INET data on a ipv6 socket. This results in the following
splat (from Dave Jones):

skbuff: skb_under_panic: text:ffffffff816765f6 len:48 put:40 head:ffff88013deb6df0 data:ffff88013deb6dec tail:0x2c end:0xc0 dev:<NULL>
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:126!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in: dccp_ipv4 dccp 8021q garp bridge stp dlci mpoa snd_seq_dummy sctp fuse hidp tun bnep nfnetlink scsi_transport_iscsi rfcomm can_raw can_bcm af_802154 appletalk caif_socket can caif ipt_ULOG x25 rose af_key pppoe pppox ipx phonet irda llc2 ppp_generic slhc p8023 psnap p8022 llc crc_ccitt atm bluetooth
+netrom ax25 nfc rfkill rds af_rxrpc coretemp hwmon kvm_intel kvm crc32c_intel snd_hda_codec_realtek ghash_clmulni_intel microcode pcspkr snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep usb_debug snd_seq snd_seq_device snd_pcm e1000e snd_page_alloc snd_timer ptp snd pps_core soundcore xfs libcrc32c
CPU: 2 PID: 8095 Comm: trinity-child2 Not tainted 3.10.0-rc7+ torvalds#37
task: ffff8801f52c2520 ti: ffff8801e6430000 task.ti: ffff8801e6430000
RIP: 0010:[<ffffffff816e759c>]  [<ffffffff816e759c>] skb_panic+0x63/0x65
RSP: 0018:ffff8801e6431de8  EFLAGS: 00010282
RAX: 0000000000000086 RBX: ffff8802353d3cc0 RCX: 0000000000000006
RDX: 0000000000003b90 RSI: ffff8801f52c2ca0 RDI: ffff8801f52c2520
RBP: ffff8801e6431e08 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: ffff88022ea0c800
R13: ffff88022ea0cdf8 R14: ffff8802353ecb40 R15: ffffffff81cc7800
FS:  00007f5720a10740(0000) GS:ffff880244c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000005862000 CR3: 000000022843c000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Stack:
 ffff88013deb6dec 000000000000002c 00000000000000c0 ffffffff81a3f6e4
 ffff8801e6431e18 ffffffff8159a9aa ffff8801e6431e90 ffffffff816765f6
 ffffffff810b756b 0000000700000002 ffff8801e6431e40 0000fea9292aa8c0
Call Trace:
 [<ffffffff8159a9aa>] skb_push+0x3a/0x40
 [<ffffffff816765f6>] ip6_push_pending_frames+0x1f6/0x4d0
 [<ffffffff810b756b>] ? mark_held_locks+0xbb/0x140
 [<ffffffff81694919>] udp_v6_push_pending_frames+0x2b9/0x3d0
 [<ffffffff81694660>] ? udplite_getfrag+0x20/0x20
 [<ffffffff8162092a>] udp_lib_setsockopt+0x1aa/0x1f0
 [<ffffffff811cc5e7>] ? fget_light+0x387/0x4f0
 [<ffffffff816958a4>] udpv6_setsockopt+0x34/0x40
 [<ffffffff815949f4>] sock_common_setsockopt+0x14/0x20
 [<ffffffff81593c31>] SyS_setsockopt+0x71/0xd0
 [<ffffffff816f5d54>] tracesys+0xdd/0xe2
Code: 00 00 48 89 44 24 10 8b 87 d8 00 00 00 48 89 44 24 08 48 8b 87 e8 00 00 00 48 c7 c7 c0 04 aa 81 48 89 04 24 31 c0 e8 e1 7e ff ff <0f> 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55
RIP  [<ffffffff816e759c>] skb_panic+0x63/0x65
 RSP <ffff8801e6431de8>

This patch adds a check if the pending data is of address family AF_INET
and directly calls udp_push_ending_frames from udp_v6_push_pending_frames
if that is the case.

This bug was found by Dave Jones with trinity.

(Also move the initialization of fl6 below the AF_INET check, even if
not strictly necessary.)

Cc: Dave Jones <[email protected]>
Cc: YOSHIFUJI Hideaki <[email protected]>
Signed-off-by: Hannes Frederic Sowa <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
Signed-off-by: Brad Figg <[email protected]>
damentz referenced this pull request in zen-kernel/zen-kernel Nov 17, 2013
commit 905b029 upstream.

When operate harddisk and hit errors, md_set_badblocks is called after
scsi_restart_operations which already disabled the irq. but md_set_badblocks
will call write_sequnlock_irq and enable irq. so softirq can preempt the
current thread and that may cause a deadlock. I think this situation should
use write_sequnlock_irqsave/irqrestore instead.

I met the situation and the call trace is below:
[  638.919974] BUG: spinlock recursion on CPU#0, scsi_eh_13/1010
[  638.921923]  lock: 0xffff8800d4d51fc8, .magic: dead4ead, .owner: scsi_eh_13/1010, .owner_cpu: 0
[  638.923890] CPU: 0 PID: 1010 Comm: scsi_eh_13 Not tainted 3.12.0-rc5+ #37
[  638.925844] Hardware name: To be filled by O.E.M. To be filled by O.E.M./MAHOBAY, BIOS 4.6.5 03/05/2013
[  638.927816]  ffff880037ad4640 ffff880118c03d50 ffffffff8172ff85 0000000000000007
[  638.929829]  ffff8800d4d51fc8 ffff880118c03d70 ffffffff81730030 ffff8800d4d51fc8
[  638.931848]  ffffffff81a72eb0 ffff880118c03d90 ffffffff81730056 ffff8800d4d51fc8
[  638.933884] Call Trace:
[  638.935867]  <IRQ>  [<ffffffff8172ff85>] dump_stack+0x55/0x76
[  638.937878]  [<ffffffff81730030>] spin_dump+0x8a/0x8f
[  638.939861]  [<ffffffff81730056>] spin_bug+0x21/0x26
[  638.941836]  [<ffffffff81336de4>] do_raw_spin_lock+0xa4/0xc0
[  638.943801]  [<ffffffff8173f036>] _raw_spin_lock+0x66/0x80
[  638.945747]  [<ffffffff814a73ed>] ? scsi_device_unbusy+0x9d/0xd0
[  638.947672]  [<ffffffff8173fb1b>] ? _raw_spin_unlock+0x2b/0x50
[  638.949595]  [<ffffffff814a73ed>] scsi_device_unbusy+0x9d/0xd0
[  638.951504]  [<ffffffff8149ec47>] scsi_finish_command+0x37/0xe0
[  638.953388]  [<ffffffff814a75e8>] scsi_softirq_done+0xa8/0x140
[  638.955248]  [<ffffffff8130e32b>] blk_done_softirq+0x7b/0x90
[  638.957116]  [<ffffffff8104fddd>] __do_softirq+0xfd/0x330
[  638.958987]  [<ffffffff810b964f>] ? __lock_release+0x6f/0x100
[  638.960861]  [<ffffffff8174a5cc>] call_softirq+0x1c/0x30
[  638.962724]  [<ffffffff81004c7d>] do_softirq+0x8d/0xc0
[  638.964565]  [<ffffffff8105024e>] irq_exit+0x10e/0x150
[  638.966390]  [<ffffffff8174ad4a>] smp_apic_timer_interrupt+0x4a/0x60
[  638.968223]  [<ffffffff817499af>] apic_timer_interrupt+0x6f/0x80
[  638.970079]  <EOI>  [<ffffffff810b964f>] ? __lock_release+0x6f/0x100
[  638.971899]  [<ffffffff8173fa6a>] ? _raw_spin_unlock_irq+0x3a/0x50
[  638.973691]  [<ffffffff8173fa60>] ? _raw_spin_unlock_irq+0x30/0x50
[  638.975475]  [<ffffffff81562393>] md_set_badblocks+0x1f3/0x4a0
[  638.977243]  [<ffffffff81566e07>] rdev_set_badblocks+0x27/0x80
[  638.978988]  [<ffffffffa00d97bb>] raid5_end_read_request+0x36b/0x4e0 [raid456]
[  638.980723]  [<ffffffff811b5a1d>] bio_endio+0x1d/0x40
[  638.982463]  [<ffffffff81304ff3>] req_bio_endio.isra.65+0x83/0xa0
[  638.984214]  [<ffffffff81306b9f>] blk_update_request+0x7f/0x350
[  638.985967]  [<ffffffff81306ea1>] blk_update_bidi_request+0x31/0x90
[  638.987710]  [<ffffffff813085e0>] __blk_end_bidi_request+0x20/0x50
[  638.989439]  [<ffffffff8130862f>] __blk_end_request_all+0x1f/0x30
[  638.991149]  [<ffffffff81308746>] blk_peek_request+0x106/0x250
[  638.992861]  [<ffffffff814a62a9>] ? scsi_kill_request.isra.32+0xe9/0x130
[  638.994561]  [<ffffffff814a633a>] scsi_request_fn+0x4a/0x3d0
[  638.996251]  [<ffffffff813040a7>] __blk_run_queue+0x37/0x50
[  638.997900]  [<ffffffff813045af>] blk_run_queue+0x2f/0x50
[  638.999553]  [<ffffffff814a5750>] scsi_run_queue+0xe0/0x1c0
[  639.001185]  [<ffffffff814a7721>] scsi_run_host_queues+0x21/0x40
[  639.002798]  [<ffffffff814a2e87>] scsi_restart_operations+0x177/0x200
[  639.004391]  [<ffffffff814a4fe9>] scsi_error_handler+0xc9/0xe0
[  639.005996]  [<ffffffff814a4f20>] ? scsi_unjam_host+0xd0/0xd0
[  639.007600]  [<ffffffff81072f6b>] kthread+0xdb/0xe0
[  639.009205]  [<ffffffff81072e90>] ? flush_kthread_worker+0x170/0x170
[  639.010821]  [<ffffffff81748cac>] ret_from_fork+0x7c/0xb0
[  639.012437]  [<ffffffff81072e90>] ? flush_kthread_worker+0x170/0x170

This bug was introduce in commit  2e8ac30
(the first time rdev_set_badblock was call from interrupt context),
so this patch is appropriate for 3.5 and subsequent kernels.

Signed-off-by: Bian Yu <[email protected]>
Reviewed-by: Jianpeng Ma <[email protected]>
Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
damentz referenced this pull request in zen-kernel/zen-kernel Nov 17, 2013
commit 905b029 upstream.

When operate harddisk and hit errors, md_set_badblocks is called after
scsi_restart_operations which already disabled the irq. but md_set_badblocks
will call write_sequnlock_irq and enable irq. so softirq can preempt the
current thread and that may cause a deadlock. I think this situation should
use write_sequnlock_irqsave/irqrestore instead.

I met the situation and the call trace is below:
[  638.919974] BUG: spinlock recursion on CPU#0, scsi_eh_13/1010
[  638.921923]  lock: 0xffff8800d4d51fc8, .magic: dead4ead, .owner: scsi_eh_13/1010, .owner_cpu: 0
[  638.923890] CPU: 0 PID: 1010 Comm: scsi_eh_13 Not tainted 3.12.0-rc5+ #37
[  638.925844] Hardware name: To be filled by O.E.M. To be filled by O.E.M./MAHOBAY, BIOS 4.6.5 03/05/2013
[  638.927816]  ffff880037ad4640 ffff880118c03d50 ffffffff8172ff85 0000000000000007
[  638.929829]  ffff8800d4d51fc8 ffff880118c03d70 ffffffff81730030 ffff8800d4d51fc8
[  638.931848]  ffffffff81a72eb0 ffff880118c03d90 ffffffff81730056 ffff8800d4d51fc8
[  638.933884] Call Trace:
[  638.935867]  <IRQ>  [<ffffffff8172ff85>] dump_stack+0x55/0x76
[  638.937878]  [<ffffffff81730030>] spin_dump+0x8a/0x8f
[  638.939861]  [<ffffffff81730056>] spin_bug+0x21/0x26
[  638.941836]  [<ffffffff81336de4>] do_raw_spin_lock+0xa4/0xc0
[  638.943801]  [<ffffffff8173f036>] _raw_spin_lock+0x66/0x80
[  638.945747]  [<ffffffff814a73ed>] ? scsi_device_unbusy+0x9d/0xd0
[  638.947672]  [<ffffffff8173fb1b>] ? _raw_spin_unlock+0x2b/0x50
[  638.949595]  [<ffffffff814a73ed>] scsi_device_unbusy+0x9d/0xd0
[  638.951504]  [<ffffffff8149ec47>] scsi_finish_command+0x37/0xe0
[  638.953388]  [<ffffffff814a75e8>] scsi_softirq_done+0xa8/0x140
[  638.955248]  [<ffffffff8130e32b>] blk_done_softirq+0x7b/0x90
[  638.957116]  [<ffffffff8104fddd>] __do_softirq+0xfd/0x330
[  638.958987]  [<ffffffff810b964f>] ? __lock_release+0x6f/0x100
[  638.960861]  [<ffffffff8174a5cc>] call_softirq+0x1c/0x30
[  638.962724]  [<ffffffff81004c7d>] do_softirq+0x8d/0xc0
[  638.964565]  [<ffffffff8105024e>] irq_exit+0x10e/0x150
[  638.966390]  [<ffffffff8174ad4a>] smp_apic_timer_interrupt+0x4a/0x60
[  638.968223]  [<ffffffff817499af>] apic_timer_interrupt+0x6f/0x80
[  638.970079]  <EOI>  [<ffffffff810b964f>] ? __lock_release+0x6f/0x100
[  638.971899]  [<ffffffff8173fa6a>] ? _raw_spin_unlock_irq+0x3a/0x50
[  638.973691]  [<ffffffff8173fa60>] ? _raw_spin_unlock_irq+0x30/0x50
[  638.975475]  [<ffffffff81562393>] md_set_badblocks+0x1f3/0x4a0
[  638.977243]  [<ffffffff81566e07>] rdev_set_badblocks+0x27/0x80
[  638.978988]  [<ffffffffa00d97bb>] raid5_end_read_request+0x36b/0x4e0 [raid456]
[  638.980723]  [<ffffffff811b5a1d>] bio_endio+0x1d/0x40
[  638.982463]  [<ffffffff81304ff3>] req_bio_endio.isra.65+0x83/0xa0
[  638.984214]  [<ffffffff81306b9f>] blk_update_request+0x7f/0x350
[  638.985967]  [<ffffffff81306ea1>] blk_update_bidi_request+0x31/0x90
[  638.987710]  [<ffffffff813085e0>] __blk_end_bidi_request+0x20/0x50
[  638.989439]  [<ffffffff8130862f>] __blk_end_request_all+0x1f/0x30
[  638.991149]  [<ffffffff81308746>] blk_peek_request+0x106/0x250
[  638.992861]  [<ffffffff814a62a9>] ? scsi_kill_request.isra.32+0xe9/0x130
[  638.994561]  [<ffffffff814a633a>] scsi_request_fn+0x4a/0x3d0
[  638.996251]  [<ffffffff813040a7>] __blk_run_queue+0x37/0x50
[  638.997900]  [<ffffffff813045af>] blk_run_queue+0x2f/0x50
[  638.999553]  [<ffffffff814a5750>] scsi_run_queue+0xe0/0x1c0
[  639.001185]  [<ffffffff814a7721>] scsi_run_host_queues+0x21/0x40
[  639.002798]  [<ffffffff814a2e87>] scsi_restart_operations+0x177/0x200
[  639.004391]  [<ffffffff814a4fe9>] scsi_error_handler+0xc9/0xe0
[  639.005996]  [<ffffffff814a4f20>] ? scsi_unjam_host+0xd0/0xd0
[  639.007600]  [<ffffffff81072f6b>] kthread+0xdb/0xe0
[  639.009205]  [<ffffffff81072e90>] ? flush_kthread_worker+0x170/0x170
[  639.010821]  [<ffffffff81748cac>] ret_from_fork+0x7c/0xb0
[  639.012437]  [<ffffffff81072e90>] ? flush_kthread_worker+0x170/0x170

This bug was introduce in commit  2e8ac30
(the first time rdev_set_badblock was call from interrupt context),
so this patch is appropriate for 3.5 and subsequent kernels.

Signed-off-by: Bian Yu <[email protected]>
Reviewed-by: Jianpeng Ma <[email protected]>
Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
mdrjr referenced this pull request in hardkernel/linux Nov 22, 2013
commit 905b029 upstream.

When operate harddisk and hit errors, md_set_badblocks is called after
scsi_restart_operations which already disabled the irq. but md_set_badblocks
will call write_sequnlock_irq and enable irq. so softirq can preempt the
current thread and that may cause a deadlock. I think this situation should
use write_sequnlock_irqsave/irqrestore instead.

I met the situation and the call trace is below:
[  638.919974] BUG: spinlock recursion on CPU#0, scsi_eh_13/1010
[  638.921923]  lock: 0xffff8800d4d51fc8, .magic: dead4ead, .owner: scsi_eh_13/1010, .owner_cpu: 0
[  638.923890] CPU: 0 PID: 1010 Comm: scsi_eh_13 Not tainted 3.12.0-rc5+ #37
[  638.925844] Hardware name: To be filled by O.E.M. To be filled by O.E.M./MAHOBAY, BIOS 4.6.5 03/05/2013
[  638.927816]  ffff880037ad4640 ffff880118c03d50 ffffffff8172ff85 0000000000000007
[  638.929829]  ffff8800d4d51fc8 ffff880118c03d70 ffffffff81730030 ffff8800d4d51fc8
[  638.931848]  ffffffff81a72eb0 ffff880118c03d90 ffffffff81730056 ffff8800d4d51fc8
[  638.933884] Call Trace:
[  638.935867]  <IRQ>  [<ffffffff8172ff85>] dump_stack+0x55/0x76
[  638.937878]  [<ffffffff81730030>] spin_dump+0x8a/0x8f
[  638.939861]  [<ffffffff81730056>] spin_bug+0x21/0x26
[  638.941836]  [<ffffffff81336de4>] do_raw_spin_lock+0xa4/0xc0
[  638.943801]  [<ffffffff8173f036>] _raw_spin_lock+0x66/0x80
[  638.945747]  [<ffffffff814a73ed>] ? scsi_device_unbusy+0x9d/0xd0
[  638.947672]  [<ffffffff8173fb1b>] ? _raw_spin_unlock+0x2b/0x50
[  638.949595]  [<ffffffff814a73ed>] scsi_device_unbusy+0x9d/0xd0
[  638.951504]  [<ffffffff8149ec47>] scsi_finish_command+0x37/0xe0
[  638.953388]  [<ffffffff814a75e8>] scsi_softirq_done+0xa8/0x140
[  638.955248]  [<ffffffff8130e32b>] blk_done_softirq+0x7b/0x90
[  638.957116]  [<ffffffff8104fddd>] __do_softirq+0xfd/0x330
[  638.958987]  [<ffffffff810b964f>] ? __lock_release+0x6f/0x100
[  638.960861]  [<ffffffff8174a5cc>] call_softirq+0x1c/0x30
[  638.962724]  [<ffffffff81004c7d>] do_softirq+0x8d/0xc0
[  638.964565]  [<ffffffff8105024e>] irq_exit+0x10e/0x150
[  638.966390]  [<ffffffff8174ad4a>] smp_apic_timer_interrupt+0x4a/0x60
[  638.968223]  [<ffffffff817499af>] apic_timer_interrupt+0x6f/0x80
[  638.970079]  <EOI>  [<ffffffff810b964f>] ? __lock_release+0x6f/0x100
[  638.971899]  [<ffffffff8173fa6a>] ? _raw_spin_unlock_irq+0x3a/0x50
[  638.973691]  [<ffffffff8173fa60>] ? _raw_spin_unlock_irq+0x30/0x50
[  638.975475]  [<ffffffff81562393>] md_set_badblocks+0x1f3/0x4a0
[  638.977243]  [<ffffffff81566e07>] rdev_set_badblocks+0x27/0x80
[  638.978988]  [<ffffffffa00d97bb>] raid5_end_read_request+0x36b/0x4e0 [raid456]
[  638.980723]  [<ffffffff811b5a1d>] bio_endio+0x1d/0x40
[  638.982463]  [<ffffffff81304ff3>] req_bio_endio.isra.65+0x83/0xa0
[  638.984214]  [<ffffffff81306b9f>] blk_update_request+0x7f/0x350
[  638.985967]  [<ffffffff81306ea1>] blk_update_bidi_request+0x31/0x90
[  638.987710]  [<ffffffff813085e0>] __blk_end_bidi_request+0x20/0x50
[  638.989439]  [<ffffffff8130862f>] __blk_end_request_all+0x1f/0x30
[  638.991149]  [<ffffffff81308746>] blk_peek_request+0x106/0x250
[  638.992861]  [<ffffffff814a62a9>] ? scsi_kill_request.isra.32+0xe9/0x130
[  638.994561]  [<ffffffff814a633a>] scsi_request_fn+0x4a/0x3d0
[  638.996251]  [<ffffffff813040a7>] __blk_run_queue+0x37/0x50
[  638.997900]  [<ffffffff813045af>] blk_run_queue+0x2f/0x50
[  638.999553]  [<ffffffff814a5750>] scsi_run_queue+0xe0/0x1c0
[  639.001185]  [<ffffffff814a7721>] scsi_run_host_queues+0x21/0x40
[  639.002798]  [<ffffffff814a2e87>] scsi_restart_operations+0x177/0x200
[  639.004391]  [<ffffffff814a4fe9>] scsi_error_handler+0xc9/0xe0
[  639.005996]  [<ffffffff814a4f20>] ? scsi_unjam_host+0xd0/0xd0
[  639.007600]  [<ffffffff81072f6b>] kthread+0xdb/0xe0
[  639.009205]  [<ffffffff81072e90>] ? flush_kthread_worker+0x170/0x170
[  639.010821]  [<ffffffff81748cac>] ret_from_fork+0x7c/0xb0
[  639.012437]  [<ffffffff81072e90>] ? flush_kthread_worker+0x170/0x170

This bug was introduce in commit  2e8ac30
(the first time rdev_set_badblock was call from interrupt context),
so this patch is appropriate for 3.5 and subsequent kernels.

Signed-off-by: Bian Yu <[email protected]>
Reviewed-by: Jianpeng Ma <[email protected]>
Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
gregnietsky pushed a commit to Distrotech/linux that referenced this pull request Apr 9, 2014
…ET pending data

commit 8822b64 upstream.

We accidentally call down to ip6_push_pending_frames when uncorking
pending AF_INET data on a ipv6 socket. This results in the following
splat (from Dave Jones):

skbuff: skb_under_panic: text:ffffffff816765f6 len:48 put:40 head:ffff88013deb6df0 data:ffff88013deb6dec tail:0x2c end:0xc0 dev:<NULL>
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:126!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in: dccp_ipv4 dccp 8021q garp bridge stp dlci mpoa snd_seq_dummy sctp fuse hidp tun bnep nfnetlink scsi_transport_iscsi rfcomm can_raw can_bcm af_802154 appletalk caif_socket can caif ipt_ULOG x25 rose af_key pppoe pppox ipx phonet irda llc2 ppp_generic slhc p8023 psnap p8022 llc crc_ccitt atm bluetooth
+netrom ax25 nfc rfkill rds af_rxrpc coretemp hwmon kvm_intel kvm crc32c_intel snd_hda_codec_realtek ghash_clmulni_intel microcode pcspkr snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep usb_debug snd_seq snd_seq_device snd_pcm e1000e snd_page_alloc snd_timer ptp snd pps_core soundcore xfs libcrc32c
CPU: 2 PID: 8095 Comm: trinity-child2 Not tainted 3.10.0-rc7+ torvalds#37
task: ffff8801f52c2520 ti: ffff8801e6430000 task.ti: ffff8801e6430000
RIP: 0010:[<ffffffff816e759c>]  [<ffffffff816e759c>] skb_panic+0x63/0x65
RSP: 0018:ffff8801e6431de8  EFLAGS: 00010282
RAX: 0000000000000086 RBX: ffff8802353d3cc0 RCX: 0000000000000006
RDX: 0000000000003b90 RSI: ffff8801f52c2ca0 RDI: ffff8801f52c2520
RBP: ffff8801e6431e08 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: ffff88022ea0c800
R13: ffff88022ea0cdf8 R14: ffff8802353ecb40 R15: ffffffff81cc7800
FS:  00007f5720a10740(0000) GS:ffff880244c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000005862000 CR3: 000000022843c000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Stack:
 ffff88013deb6dec 000000000000002c 00000000000000c0 ffffffff81a3f6e4
 ffff8801e6431e18 ffffffff8159a9aa ffff8801e6431e90 ffffffff816765f6
 ffffffff810b756b 0000000700000002 ffff8801e6431e40 0000fea9292aa8c0
Call Trace:
 [<ffffffff8159a9aa>] skb_push+0x3a/0x40
 [<ffffffff816765f6>] ip6_push_pending_frames+0x1f6/0x4d0
 [<ffffffff810b756b>] ? mark_held_locks+0xbb/0x140
 [<ffffffff81694919>] udp_v6_push_pending_frames+0x2b9/0x3d0
 [<ffffffff81694660>] ? udplite_getfrag+0x20/0x20
 [<ffffffff8162092a>] udp_lib_setsockopt+0x1aa/0x1f0
 [<ffffffff811cc5e7>] ? fget_light+0x387/0x4f0
 [<ffffffff816958a4>] udpv6_setsockopt+0x34/0x40
 [<ffffffff815949f4>] sock_common_setsockopt+0x14/0x20
 [<ffffffff81593c31>] SyS_setsockopt+0x71/0xd0
 [<ffffffff816f5d54>] tracesys+0xdd/0xe2
Code: 00 00 48 89 44 24 10 8b 87 d8 00 00 00 48 89 44 24 08 48 8b 87 e8 00 00 00 48 c7 c7 c0 04 aa 81 48 89 04 24 31 c0 e8 e1 7e ff ff <0f> 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55
RIP  [<ffffffff816e759c>] skb_panic+0x63/0x65
 RSP <ffff8801e6431de8>

This patch adds a check if the pending data is of address family AF_INET
and directly calls udp_push_ending_frames from udp_v6_push_pending_frames
if that is the case.

This bug was found by Dave Jones with trinity.

(Also move the initialization of fl6 below the AF_INET check, even if
not strictly necessary.)

Cc: Dave Jones <[email protected]>
Cc: YOSHIFUJI Hideaki <[email protected]>
Signed-off-by: Hannes Frederic Sowa <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
[PG: The line "flowi6 *fl6 = &inet->cork.fl.u.ip6"  was
              "flowi *fl = &inet->cork.fl" in 2.6.34 kernel.]
Signed-off-by: Paul Gortmaker <[email protected]>
shr-buildhost pushed a commit to shr-distribution/linux that referenced this pull request May 18, 2014
Running one program that continuously hotplugs and replugs a cpu
concurrently with another program that continuously writes to the
scaling_set_speed node eventually deadlocks with:

=============================================
[ INFO: possible recursive locking detected ]
3.4.0 torvalds#37 Tainted: G        W
---------------------------------------------
filemonkey/122 is trying to acquire lock:
 (s_active#13){++++.+}, at: [<c01a3d28>] sysfs_remove_dir+0x9c/0xb4

but task is already holding lock:
 (s_active#13){++++.+}, at: [<c01a22f0>] sysfs_write_file+0xe8/0x140

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(s_active#13);
  lock(s_active#13);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

2 locks held by filemonkey/122:
 #0:  (&buffer->mutex){+.+.+.}, at: [<c01a2230>] sysfs_write_file+0x28/0x140
 #1:  (s_active#13){++++.+}, at: [<c01a22f0>] sysfs_write_file+0xe8/0x140

stack backtrace:
[<c0014fcc>] (unwind_backtrace+0x0/0x120) from [<c00ca600>] (validate_chain+0x6f8/0x1054)
[<c00ca600>] (validate_chain+0x6f8/0x1054) from [<c00cb778>] (__lock_acquire+0x81c/0x8d8)
[<c00cb778>] (__lock_acquire+0x81c/0x8d8) from [<c00cb9c0>] (lock_acquire+0x18c/0x1e8)
[<c00cb9c0>] (lock_acquire+0x18c/0x1e8) from [<c01a3ba8>] (sysfs_addrm_finish+0xd0/0x180)
[<c01a3ba8>] (sysfs_addrm_finish+0xd0/0x180) from [<c01a3d28>] (sysfs_remove_dir+0x9c/0xb4)
[<c01a3d28>] (sysfs_remove_dir+0x9c/0xb4) from [<c02d0e5c>] (kobject_del+0x10/0x38)
[<c02d0e5c>] (kobject_del+0x10/0x38) from [<c02d0f74>] (kobject_release+0xf0/0x194)
[<c02d0f74>] (kobject_release+0xf0/0x194) from [<c0565a98>] (cpufreq_cpu_put+0xc/0x24)
[<c0565a98>] (cpufreq_cpu_put+0xc/0x24) from [<c05683f0>] (store+0x6c/0x74)
[<c05683f0>] (store+0x6c/0x74) from [<c01a2314>] (sysfs_write_file+0x10c/0x140)
[<c01a2314>] (sysfs_write_file+0x10c/0x140) from [<c014af44>] (vfs_write+0xb0/0x128)
[<c014af44>] (vfs_write+0xb0/0x128) from [<c014b06c>] (sys_write+0x3c/0x68)
[<c014b06c>] (sys_write+0x3c/0x68) from [<c000e0e0>] (ret_fast_syscall+0x0/0x3c)

This is because store() in cpufreq.c indirectly grabs a kobject
with kobject_get() and is the last one to call kobject_put()
indirectly via cpufreq_cpu_put().

Fix this deadlock by introducing two new functions,
cpufreq_cpu_get_sysfs() and cpufreq_cpu_put_sysfs() which do the
same thing as cpufreq_cpu_{get,put}() but don't grab the kobject.

CRs-fixed: 366560
Change-Id: I23ea50a01793b2b2af0972cde52dba7396925fe3
Signed-off-by: Stephen Boyd <[email protected]>
gregnietsky pushed a commit to Distrotech/linux that referenced this pull request May 27, 2014
…ET pending data

CVE-2013-4162

BugLink: http://bugs.launchpad.net/bugs/1205070

We accidentally call down to ip6_push_pending_frames when uncorking
pending AF_INET data on a ipv6 socket. This results in the following
splat (from Dave Jones):

skbuff: skb_under_panic: text:ffffffff816765f6 len:48 put:40 head:ffff88013deb6df0 data:ffff88013deb6dec tail:0x2c end:0xc0 dev:<NULL>
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:126!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in: dccp_ipv4 dccp 8021q garp bridge stp dlci mpoa snd_seq_dummy sctp fuse hidp tun bnep nfnetlink scsi_transport_iscsi rfcomm can_raw can_bcm af_802154 appletalk caif_socket can caif ipt_ULOG x25 rose af_key pppoe pppox ipx phonet irda llc2 ppp_generic slhc p8023 psnap p8022 llc crc_ccitt atm bluetooth
+netrom ax25 nfc rfkill rds af_rxrpc coretemp hwmon kvm_intel kvm crc32c_intel snd_hda_codec_realtek ghash_clmulni_intel microcode pcspkr snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep usb_debug snd_seq snd_seq_device snd_pcm e1000e snd_page_alloc snd_timer ptp snd pps_core soundcore xfs libcrc32c
CPU: 2 PID: 8095 Comm: trinity-child2 Not tainted 3.10.0-rc7+ torvalds#37
task: ffff8801f52c2520 ti: ffff8801e6430000 task.ti: ffff8801e6430000
RIP: 0010:[<ffffffff816e759c>]  [<ffffffff816e759c>] skb_panic+0x63/0x65
RSP: 0018:ffff8801e6431de8  EFLAGS: 00010282
RAX: 0000000000000086 RBX: ffff8802353d3cc0 RCX: 0000000000000006
RDX: 0000000000003b90 RSI: ffff8801f52c2ca0 RDI: ffff8801f52c2520
RBP: ffff8801e6431e08 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: ffff88022ea0c800
R13: ffff88022ea0cdf8 R14: ffff8802353ecb40 R15: ffffffff81cc7800
FS:  00007f5720a10740(0000) GS:ffff880244c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000005862000 CR3: 000000022843c000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Stack:
 ffff88013deb6dec 000000000000002c 00000000000000c0 ffffffff81a3f6e4
 ffff8801e6431e18 ffffffff8159a9aa ffff8801e6431e90 ffffffff816765f6
 ffffffff810b756b 0000000700000002 ffff8801e6431e40 0000fea9292aa8c0
Call Trace:
 [<ffffffff8159a9aa>] skb_push+0x3a/0x40
 [<ffffffff816765f6>] ip6_push_pending_frames+0x1f6/0x4d0
 [<ffffffff810b756b>] ? mark_held_locks+0xbb/0x140
 [<ffffffff81694919>] udp_v6_push_pending_frames+0x2b9/0x3d0
 [<ffffffff81694660>] ? udplite_getfrag+0x20/0x20
 [<ffffffff8162092a>] udp_lib_setsockopt+0x1aa/0x1f0
 [<ffffffff811cc5e7>] ? fget_light+0x387/0x4f0
 [<ffffffff816958a4>] udpv6_setsockopt+0x34/0x40
 [<ffffffff815949f4>] sock_common_setsockopt+0x14/0x20
 [<ffffffff81593c31>] SyS_setsockopt+0x71/0xd0
 [<ffffffff816f5d54>] tracesys+0xdd/0xe2
Code: 00 00 48 89 44 24 10 8b 87 d8 00 00 00 48 89 44 24 08 48 8b 87 e8 00 00 00 48 c7 c7 c0 04 aa 81 48 89 04 24 31 c0 e8 e1 7e ff ff <0f> 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55
RIP  [<ffffffff816e759c>] skb_panic+0x63/0x65
 RSP <ffff8801e6431de8>

This patch adds a check if the pending data is of address family AF_INET
and directly calls udp_push_ending_frames from udp_v6_push_pending_frames
if that is the case.

This bug was found by Dave Jones with trinity.

(Also move the initialization of fl6 below the AF_INET check, even if
not strictly necessary.)

Cc: Dave Jones <[email protected]>
Cc: YOSHIFUJI Hideaki <[email protected]>
Signed-off-by: Hannes Frederic Sowa <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
(back ported from commit 8822b64)
Signed-off-by: Luis Henriques <[email protected]>
Acked-by: Brad Figg <[email protected]>
Signed-off-by: Tim Gardner <[email protected]>
Signed-off-by: Willy Tarreau <[email protected]>
shr-buildhost pushed a commit to shr-distribution/linux that referenced this pull request Jun 3, 2014
Running one program that continuously hotplugs and replugs a cpu
concurrently with another program that continuously writes to the
scaling_set_speed node eventually deadlocks with:

=============================================
[ INFO: possible recursive locking detected ]
3.4.0 torvalds#37 Tainted: G        W
---------------------------------------------
filemonkey/122 is trying to acquire lock:
 (s_active#13){++++.+}, at: [<c01a3d28>] sysfs_remove_dir+0x9c/0xb4

but task is already holding lock:
 (s_active#13){++++.+}, at: [<c01a22f0>] sysfs_write_file+0xe8/0x140

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(s_active#13);
  lock(s_active#13);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

2 locks held by filemonkey/122:
 #0:  (&buffer->mutex){+.+.+.}, at: [<c01a2230>] sysfs_write_file+0x28/0x140
 #1:  (s_active#13){++++.+}, at: [<c01a22f0>] sysfs_write_file+0xe8/0x140

stack backtrace:
[<c0014fcc>] (unwind_backtrace+0x0/0x120) from [<c00ca600>] (validate_chain+0x6f8/0x1054)
[<c00ca600>] (validate_chain+0x6f8/0x1054) from [<c00cb778>] (__lock_acquire+0x81c/0x8d8)
[<c00cb778>] (__lock_acquire+0x81c/0x8d8) from [<c00cb9c0>] (lock_acquire+0x18c/0x1e8)
[<c00cb9c0>] (lock_acquire+0x18c/0x1e8) from [<c01a3ba8>] (sysfs_addrm_finish+0xd0/0x180)
[<c01a3ba8>] (sysfs_addrm_finish+0xd0/0x180) from [<c01a3d28>] (sysfs_remove_dir+0x9c/0xb4)
[<c01a3d28>] (sysfs_remove_dir+0x9c/0xb4) from [<c02d0e5c>] (kobject_del+0x10/0x38)
[<c02d0e5c>] (kobject_del+0x10/0x38) from [<c02d0f74>] (kobject_release+0xf0/0x194)
[<c02d0f74>] (kobject_release+0xf0/0x194) from [<c0565a98>] (cpufreq_cpu_put+0xc/0x24)
[<c0565a98>] (cpufreq_cpu_put+0xc/0x24) from [<c05683f0>] (store+0x6c/0x74)
[<c05683f0>] (store+0x6c/0x74) from [<c01a2314>] (sysfs_write_file+0x10c/0x140)
[<c01a2314>] (sysfs_write_file+0x10c/0x140) from [<c014af44>] (vfs_write+0xb0/0x128)
[<c014af44>] (vfs_write+0xb0/0x128) from [<c014b06c>] (sys_write+0x3c/0x68)
[<c014b06c>] (sys_write+0x3c/0x68) from [<c000e0e0>] (ret_fast_syscall+0x0/0x3c)

This is because store() in cpufreq.c indirectly grabs a kobject
with kobject_get() and is the last one to call kobject_put()
indirectly via cpufreq_cpu_put().

Fix this deadlock by introducing two new functions,
cpufreq_cpu_get_sysfs() and cpufreq_cpu_put_sysfs() which do the
same thing as cpufreq_cpu_{get,put}() but don't grab the kobject.

CRs-fixed: 366560

Signed-off-by: Ramakrishna Prasad N <[email protected]>
(cherry picked from commit ddda0cf)

Change-Id: I6ac89789e89099cff315d7a02de216423376b7f7
Signed-off-by: Shruthi Krishna <[email protected]>
Gnurou pushed a commit to Gnurou/linux that referenced this pull request Jul 1, 2014
Since crossbar is s/w configurable, the initial settings of the
crossbar cannot be assumed to be sane. This implies that:
a) On initialization all un-reserved crossbars must be initialized to
   a known 'safe' value.
b) When unmapping the interrupt, the safe value must be written to
   ensure that the crossbar mapping matches with interrupt controller
   usage.

So provide a safe value in the dt data to map if
'0' is not safe for the platform and use it during init and unmap

While at this, fix the below checkpatch warning.
Fixes checkpatch warning:
WARNING: Unnecessary space before function pointer arguments
 torvalds#37: FILE: drivers/irqchip/irq-crossbar.c:37:
 +	void (*write) (int, int);

Signed-off-by: Nishanth Menon <[email protected]>
Signed-off-by: Sricharan R <[email protected]>
Acked-by: Santosh Shilimkar <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Jason Cooper <[email protected]>
koct9i pushed a commit to koct9i/linux that referenced this pull request Sep 23, 2014
ERROR: code indent should use tabs where possible
torvalds#37: FILE: include/linux/mmdebug.h:33:
+        do {^I^I^I^I^I^I^I^I\$

WARNING: please, no spaces at the start of a line
torvalds#37: FILE: include/linux/mmdebug.h:33:
+        do {^I^I^I^I^I^I^I^I\$

ERROR: code indent should use tabs where possible
torvalds#38: FILE: include/linux/mmdebug.h:34:
+                if (unlikely(cond)) {^I^I^I^I^I\$

WARNING: please, no spaces at the start of a line
torvalds#38: FILE: include/linux/mmdebug.h:34:
+                if (unlikely(cond)) {^I^I^I^I^I\$

ERROR: code indent should use tabs where possible
torvalds#39: FILE: include/linux/mmdebug.h:35:
+                        dump_mm(mm);^I^I^I^I^I\$

WARNING: please, no spaces at the start of a line
torvalds#39: FILE: include/linux/mmdebug.h:35:
+                        dump_mm(mm);^I^I^I^I^I\$

ERROR: code indent should use tabs where possible
torvalds#40: FILE: include/linux/mmdebug.h:36:
+                        BUG();^I^I^I^I^I^I\$

WARNING: please, no spaces at the start of a line
torvalds#40: FILE: include/linux/mmdebug.h:36:
+                        BUG();^I^I^I^I^I^I\$

ERROR: code indent should use tabs where possible
torvalds#41: FILE: include/linux/mmdebug.h:37:
+                }^I^I^I^I^I^I^I\$

WARNING: please, no spaces at the start of a line
torvalds#41: FILE: include/linux/mmdebug.h:37:
+                }^I^I^I^I^I^I^I\$

ERROR: code indent should use tabs where possible
torvalds#42: FILE: include/linux/mmdebug.h:38:
+        } while (0)$

WARNING: please, no spaces at the start of a line
torvalds#42: FILE: include/linux/mmdebug.h:38:
+        } while (0)$

WARNING: Prefer [subsystem eg: netdev]_alert([subsystem]dev, ... then dev_alert(dev, ... then pr_alert(...  to printk(KERN_ALERT ...
torvalds#74: FILE: mm/debug.c:171:
+	printk(KERN_ALERT

total: 6 errors, 7 warnings, 109 lines checked

NOTE: whitespace errors detected, you may wish to use scripts/cleanpatch or
      scripts/cleanfile

./patches/mm-introduce-vm_bug_on_mm.patch has style problems, please review.

If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Sasha Levin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
aryabinin pushed a commit to aryabinin/linux that referenced this pull request Sep 24, 2014
ERROR: code indent should use tabs where possible
torvalds#37: FILE: include/linux/mmdebug.h:33:
+        do {^I^I^I^I^I^I^I^I\$

WARNING: please, no spaces at the start of a line
torvalds#37: FILE: include/linux/mmdebug.h:33:
+        do {^I^I^I^I^I^I^I^I\$

ERROR: code indent should use tabs where possible
torvalds#38: FILE: include/linux/mmdebug.h:34:
+                if (unlikely(cond)) {^I^I^I^I^I\$

WARNING: please, no spaces at the start of a line
torvalds#38: FILE: include/linux/mmdebug.h:34:
+                if (unlikely(cond)) {^I^I^I^I^I\$

ERROR: code indent should use tabs where possible
torvalds#39: FILE: include/linux/mmdebug.h:35:
+                        dump_mm(mm);^I^I^I^I^I\$

WARNING: please, no spaces at the start of a line
torvalds#39: FILE: include/linux/mmdebug.h:35:
+                        dump_mm(mm);^I^I^I^I^I\$

ERROR: code indent should use tabs where possible
torvalds#40: FILE: include/linux/mmdebug.h:36:
+                        BUG();^I^I^I^I^I^I\$

WARNING: please, no spaces at the start of a line
torvalds#40: FILE: include/linux/mmdebug.h:36:
+                        BUG();^I^I^I^I^I^I\$

ERROR: code indent should use tabs where possible
torvalds#41: FILE: include/linux/mmdebug.h:37:
+                }^I^I^I^I^I^I^I\$

WARNING: please, no spaces at the start of a line
torvalds#41: FILE: include/linux/mmdebug.h:37:
+                }^I^I^I^I^I^I^I\$

ERROR: code indent should use tabs where possible
torvalds#42: FILE: include/linux/mmdebug.h:38:
+        } while (0)$

WARNING: please, no spaces at the start of a line
torvalds#42: FILE: include/linux/mmdebug.h:38:
+        } while (0)$

WARNING: Prefer [subsystem eg: netdev]_alert([subsystem]dev, ... then dev_alert(dev, ... then pr_alert(...  to printk(KERN_ALERT ...
torvalds#74: FILE: mm/debug.c:171:
+	printk(KERN_ALERT

total: 6 errors, 7 warnings, 109 lines checked

NOTE: whitespace errors detected, you may wish to use scripts/cleanpatch or
      scripts/cleanfile

./patches/mm-introduce-vm_bug_on_mm.patch has style problems, please review.

If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Sasha Levin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ddstreet pushed a commit to ddstreet/linux that referenced this pull request Sep 25, 2014
ERROR: code indent should use tabs where possible
torvalds#37: FILE: include/linux/mmdebug.h:33:
+        do {^I^I^I^I^I^I^I^I\$

WARNING: please, no spaces at the start of a line
torvalds#37: FILE: include/linux/mmdebug.h:33:
+        do {^I^I^I^I^I^I^I^I\$

ERROR: code indent should use tabs where possible
torvalds#38: FILE: include/linux/mmdebug.h:34:
+                if (unlikely(cond)) {^I^I^I^I^I\$

WARNING: please, no spaces at the start of a line
torvalds#38: FILE: include/linux/mmdebug.h:34:
+                if (unlikely(cond)) {^I^I^I^I^I\$

ERROR: code indent should use tabs where possible
torvalds#39: FILE: include/linux/mmdebug.h:35:
+                        dump_mm(mm);^I^I^I^I^I\$

WARNING: please, no spaces at the start of a line
torvalds#39: FILE: include/linux/mmdebug.h:35:
+                        dump_mm(mm);^I^I^I^I^I\$

ERROR: code indent should use tabs where possible
torvalds#40: FILE: include/linux/mmdebug.h:36:
+                        BUG();^I^I^I^I^I^I\$

WARNING: please, no spaces at the start of a line
torvalds#40: FILE: include/linux/mmdebug.h:36:
+                        BUG();^I^I^I^I^I^I\$

ERROR: code indent should use tabs where possible
torvalds#41: FILE: include/linux/mmdebug.h:37:
+                }^I^I^I^I^I^I^I\$

WARNING: please, no spaces at the start of a line
torvalds#41: FILE: include/linux/mmdebug.h:37:
+                }^I^I^I^I^I^I^I\$

ERROR: code indent should use tabs where possible
torvalds#42: FILE: include/linux/mmdebug.h:38:
+        } while (0)$

WARNING: please, no spaces at the start of a line
torvalds#42: FILE: include/linux/mmdebug.h:38:
+        } while (0)$

WARNING: Prefer [subsystem eg: netdev]_alert([subsystem]dev, ... then dev_alert(dev, ... then pr_alert(...  to printk(KERN_ALERT ...
torvalds#74: FILE: mm/debug.c:171:
+	printk(KERN_ALERT

total: 6 errors, 7 warnings, 109 lines checked

NOTE: whitespace errors detected, you may wish to use scripts/cleanpatch or
      scripts/cleanfile

./patches/mm-introduce-vm_bug_on_mm.patch has style problems, please review.

If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Sasha Levin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
koct9i pushed a commit to koct9i/linux that referenced this pull request Sep 27, 2014
ERROR: code indent should use tabs where possible
torvalds#37: FILE: include/linux/mmdebug.h:33:
+        do {^I^I^I^I^I^I^I^I\$

WARNING: please, no spaces at the start of a line
torvalds#37: FILE: include/linux/mmdebug.h:33:
+        do {^I^I^I^I^I^I^I^I\$

ERROR: code indent should use tabs where possible
torvalds#38: FILE: include/linux/mmdebug.h:34:
+                if (unlikely(cond)) {^I^I^I^I^I\$

WARNING: please, no spaces at the start of a line
torvalds#38: FILE: include/linux/mmdebug.h:34:
+                if (unlikely(cond)) {^I^I^I^I^I\$

ERROR: code indent should use tabs where possible
torvalds#39: FILE: include/linux/mmdebug.h:35:
+                        dump_mm(mm);^I^I^I^I^I\$

WARNING: please, no spaces at the start of a line
torvalds#39: FILE: include/linux/mmdebug.h:35:
+                        dump_mm(mm);^I^I^I^I^I\$

ERROR: code indent should use tabs where possible
torvalds#40: FILE: include/linux/mmdebug.h:36:
+                        BUG();^I^I^I^I^I^I\$

WARNING: please, no spaces at the start of a line
torvalds#40: FILE: include/linux/mmdebug.h:36:
+                        BUG();^I^I^I^I^I^I\$

ERROR: code indent should use tabs where possible
torvalds#41: FILE: include/linux/mmdebug.h:37:
+                }^I^I^I^I^I^I^I\$

WARNING: please, no spaces at the start of a line
torvalds#41: FILE: include/linux/mmdebug.h:37:
+                }^I^I^I^I^I^I^I\$

ERROR: code indent should use tabs where possible
torvalds#42: FILE: include/linux/mmdebug.h:38:
+        } while (0)$

WARNING: please, no spaces at the start of a line
torvalds#42: FILE: include/linux/mmdebug.h:38:
+        } while (0)$

WARNING: Prefer [subsystem eg: netdev]_alert([subsystem]dev, ... then dev_alert(dev, ... then pr_alert(...  to printk(KERN_ALERT ...
torvalds#74: FILE: mm/debug.c:171:
+	printk(KERN_ALERT

total: 6 errors, 7 warnings, 109 lines checked

NOTE: whitespace errors detected, you may wish to use scripts/cleanpatch or
      scripts/cleanfile

./patches/mm-introduce-vm_bug_on_mm.patch has style problems, please review.

If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Sasha Levin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
tom3q pushed a commit to tom3q/linux that referenced this pull request Oct 2, 2014
When doing vfio passthrough a VF, the kernel will crash with following
message:

[  442.656459] Unable to handle kernel paging request for data at address 0x00000060
[  442.656593] Faulting instruction address: 0xc000000000038b88
[  442.656706] Oops: Kernel access of bad area, sig: 11 [#1]
[  442.656798] SMP NR_CPUS=1024 NUMA PowerNV
[  442.656890] Modules linked in: vfio_pci mlx4_core nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT xt_conntrack bnep bluetooth rfkill ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw tg3 nfsd be2net nfs_acl ses lockd ptp enclosure pps_core kvm_hv kvm_pr shpchp binfmt_misc kvm sunrpc uinput lpfc scsi_transport_fc ipr scsi_tgt [last unloaded: mlx4_core]
[  442.658152] CPU: 40 PID: 14948 Comm: qemu-system-ppc Not tainted 3.10.42yw-pkvm+ torvalds#37
[  442.658219] task: c000000f7e2a9a00 ti: c000000f6dc3c000 task.ti: c000000f6dc3c000
[  442.658287] NIP: c000000000038b88 LR: c0000000004435a8 CTR: c000000000455bc0
[  442.658352] REGS: c000000f6dc3f580 TRAP: 0300   Not tainted  (3.10.42yw-pkvm+)
[  442.658419] MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI>  CR: 28004882  XER: 20000000
[  442.658577] CFAR: c00000000000908c DAR: 0000000000000060 DSISR: 40000000 SOFTE: 1
GPR00: c0000000004435a8 c000000f6dc3f800 c0000000012b1c10 c00000000da24000
GPR04: 0000000000000003 0000000000001004 00000000000015b3 000000000000ffff
GPR08: c00000000127f5d8 0000000000000000 000000000000ffff 0000000000000000
GPR12: c000000000068078 c00000000fdd6800 000001003c320c80 000001003c3607f0
GPR16: 0000000000000001 00000000105480c8 000000001055aaa8 000001003c31ab18
GPR20: 000001003c10fb40 000001003c360ae8 000000001063bcf0 000000001063bdb0
GPR24: 000001003c15ed70 0000000010548f40 c000001fe5514c88 c000001fe5514cb0
GPR28: c00000000da24000 0000000000000000 c00000000da24000 0000000000000003
[  442.659471] NIP [c000000000038b88] .pcibios_set_pcie_reset_state+0x28/0x130
[  442.659530] LR [c0000000004435a8] .pci_set_pcie_reset_state+0x28/0x40
[  442.659585] Call Trace:
[  442.659610] [c000000f6dc3f800] [00000000000719e0] 0x719e0 (unreliable)
[  442.659677] [c000000f6dc3f880] [c0000000004435a8] .pci_set_pcie_reset_state+0x28/0x40
[  442.659757] [c000000f6dc3f900] [c000000000455bf8] .reset_fundamental+0x38/0x80
[  442.659835] [c000000f6dc3f980] [c0000000004562a8] .pci_dev_specific_reset+0xa8/0xf0
[  442.659913] [c000000f6dc3fa00] [c0000000004448c4] .__pci_dev_reset+0x44/0x430
[  442.659980] [c000000f6dc3fab0] [c000000000444d5c] .pci_reset_function+0x7c/0xc0
[  442.660059] [c000000f6dc3fb30] [d00000001c141ab8] .vfio_pci_open+0xe8/0x2b0 [vfio_pci]
[  442.660139] [c000000f6dc3fbd0] [c000000000586c30] .vfio_group_fops_unl_ioctl+0x3a0/0x630
[  442.660219] [c000000f6dc3fc90] [c000000000255fbc] .do_vfs_ioctl+0x4ec/0x7c0
[  442.660286] [c000000f6dc3fd80] [c000000000256364] .SyS_ioctl+0xd4/0xf0
[  442.660354] [c000000f6dc3fe30] [c000000000009e54] syscall_exit+0x0/0x98
[  442.660420] Instruction dump:
[  442.660454] 4bfffce9 4bfffee4 7c0802a6 fbc1fff0 fbe1fff8 f8010010 f821ff81 7c7e1b78
[  442.660566] 7c9f2378 60000000 60000000 e93e02c8 <e8690060> 2fa30000 41de00c4 2b9f0002
[  442.660679] ---[ end trace a64ac9546bcf0328 ]---
[  442.660724]

The reason is current VF is not EEH enabled.

This patch introduces a macro to convert eeh_dev to eeh_pe. By doing so, it
will prevent converting with NULL pointer.

Signed-off-by: Wei Yang <[email protected]>
Acked-by: Gavin Shan <[email protected]>
CC: Michael Ellerman <[email protected]>

V3 -> V4:
   1. move the macro definition from include/linux/pci.h to
      arch/powerpc/include/asm/eeh.h

V2 -> V3:
   1. rebased on 3.17-rc4
   2. introduce a macro
   3. use this macro in several other places

V1 -> V2:
   1. code style and patch subject adjustment

Signed-off-by: Michael Ellerman <[email protected]>
tom3q pushed a commit to tom3q/linux that referenced this pull request Oct 2, 2014
ERROR: code indent should use tabs where possible
torvalds#37: FILE: include/linux/mmdebug.h:33:
+        do {^I^I^I^I^I^I^I^I\$

WARNING: please, no spaces at the start of a line
torvalds#37: FILE: include/linux/mmdebug.h:33:
+        do {^I^I^I^I^I^I^I^I\$

ERROR: code indent should use tabs where possible
torvalds#38: FILE: include/linux/mmdebug.h:34:
+                if (unlikely(cond)) {^I^I^I^I^I\$

WARNING: please, no spaces at the start of a line
torvalds#38: FILE: include/linux/mmdebug.h:34:
+                if (unlikely(cond)) {^I^I^I^I^I\$

ERROR: code indent should use tabs where possible
torvalds#39: FILE: include/linux/mmdebug.h:35:
+                        dump_mm(mm);^I^I^I^I^I\$

WARNING: please, no spaces at the start of a line
torvalds#39: FILE: include/linux/mmdebug.h:35:
+                        dump_mm(mm);^I^I^I^I^I\$

ERROR: code indent should use tabs where possible
torvalds#40: FILE: include/linux/mmdebug.h:36:
+                        BUG();^I^I^I^I^I^I\$

WARNING: please, no spaces at the start of a line
torvalds#40: FILE: include/linux/mmdebug.h:36:
+                        BUG();^I^I^I^I^I^I\$

ERROR: code indent should use tabs where possible
torvalds#41: FILE: include/linux/mmdebug.h:37:
+                }^I^I^I^I^I^I^I\$

WARNING: please, no spaces at the start of a line
torvalds#41: FILE: include/linux/mmdebug.h:37:
+                }^I^I^I^I^I^I^I\$

ERROR: code indent should use tabs where possible
torvalds#42: FILE: include/linux/mmdebug.h:38:
+        } while (0)$

WARNING: please, no spaces at the start of a line
torvalds#42: FILE: include/linux/mmdebug.h:38:
+        } while (0)$

WARNING: Prefer [subsystem eg: netdev]_alert([subsystem]dev, ... then dev_alert(dev, ... then pr_alert(...  to printk(KERN_ALERT ...
torvalds#74: FILE: mm/debug.c:171:
+	printk(KERN_ALERT

total: 6 errors, 7 warnings, 109 lines checked

NOTE: whitespace errors detected, you may wish to use scripts/cleanpatch or
      scripts/cleanfile

./patches/mm-introduce-vm_bug_on_mm.patch has style problems, please review.

If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Sasha Levin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
aryabinin pushed a commit to aryabinin/linux that referenced this pull request Oct 3, 2014
ERROR: code indent should use tabs where possible
torvalds#37: FILE: include/linux/mmdebug.h:33:
+        do {^I^I^I^I^I^I^I^I\$

WARNING: please, no spaces at the start of a line
torvalds#37: FILE: include/linux/mmdebug.h:33:
+        do {^I^I^I^I^I^I^I^I\$

ERROR: code indent should use tabs where possible
torvalds#38: FILE: include/linux/mmdebug.h:34:
+                if (unlikely(cond)) {^I^I^I^I^I\$

WARNING: please, no spaces at the start of a line
torvalds#38: FILE: include/linux/mmdebug.h:34:
+                if (unlikely(cond)) {^I^I^I^I^I\$

ERROR: code indent should use tabs where possible
torvalds#39: FILE: include/linux/mmdebug.h:35:
+                        dump_mm(mm);^I^I^I^I^I\$

WARNING: please, no spaces at the start of a line
torvalds#39: FILE: include/linux/mmdebug.h:35:
+                        dump_mm(mm);^I^I^I^I^I\$

ERROR: code indent should use tabs where possible
torvalds#40: FILE: include/linux/mmdebug.h:36:
+                        BUG();^I^I^I^I^I^I\$

WARNING: please, no spaces at the start of a line
torvalds#40: FILE: include/linux/mmdebug.h:36:
+                        BUG();^I^I^I^I^I^I\$

ERROR: code indent should use tabs where possible
torvalds#41: FILE: include/linux/mmdebug.h:37:
+                }^I^I^I^I^I^I^I\$

WARNING: please, no spaces at the start of a line
torvalds#41: FILE: include/linux/mmdebug.h:37:
+                }^I^I^I^I^I^I^I\$

ERROR: code indent should use tabs where possible
torvalds#42: FILE: include/linux/mmdebug.h:38:
+        } while (0)$

WARNING: please, no spaces at the start of a line
torvalds#42: FILE: include/linux/mmdebug.h:38:
+        } while (0)$

WARNING: Prefer [subsystem eg: netdev]_alert([subsystem]dev, ... then dev_alert(dev, ... then pr_alert(...  to printk(KERN_ALERT ...
torvalds#74: FILE: mm/debug.c:171:
+	printk(KERN_ALERT

total: 6 errors, 7 warnings, 109 lines checked

NOTE: whitespace errors detected, you may wish to use scripts/cleanpatch or
      scripts/cleanfile

./patches/mm-introduce-vm_bug_on_mm.patch has style problems, please review.

If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Sasha Levin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
aryabinin referenced this pull request in aryabinin/linux Oct 3, 2014
GIT 6fe676b243e5a0cb4cc4d9a4b094de8db0cdbf74

commit e500f488c27659bb6f5d313b336621f3daa67701
Author: Fabian Frederick <[email protected]>
Date:   Wed Oct 1 06:52:06 2014 +0200

    net/dccp/ccid.c: add __init to ccid_activate
    
    ccid_activate is only called by __init ccid_initialize_builtins in same module.
    
    Signed-off-by: Fabian Frederick <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 0c5b8a46294d43fc63788839d3c18de0961ec1bc
Author: Fabian Frederick <[email protected]>
Date:   Wed Oct 1 06:48:03 2014 +0200

    net/dccp/proto.c: add __init to dccp_mib_init
    
    dccp_mib_init is only called by __init dccp_init in same module.
    
    Signed-off-by: Fabian Frederick <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 082f58ac4a48d3f5cb4597232cb2ac6823a96f43
Author: Quinn Tran <[email protected]>
Date:   Thu Sep 25 06:22:28 2014 -0400

    target: Fix queue full status NULL pointer for SCF_TRANSPORT_TASK_SENSE
    
    During temporary resource starvation at lower transport layer, command
    is placed on queue full retry path, which expose this problem.  The TCM
    queue full handling of SCF_TRANSPORT_TASK_SENSE currently sends the same
    cmd twice to lower layer.  The 1st time led to cmd normal free path.
    The 2nd time cause Null pointer access.
    
    This regression bug was originally introduced v3.1-rc code in the
    following commit:
    
    commit e057f53308a5f071556ee80586b99ee755bf07f5
    Author: Christoph Hellwig <[email protected]>
    Date:   Mon Oct 17 13:56:41 2011 -0400
    
        target: remove the transport_qf_callback se_cmd callback
    
    Signed-off-by: Quinn Tran <[email protected]>
    Signed-off-by: Saurav Kashyap <[email protected]>
    Cc: <[email protected]> # v3.1+
    Signed-off-by: Nicholas Bellinger <[email protected]>

commit db3a99b9921f27fe71ca8c0f218ee810e0e7fb69
Author: Joern Engel <[email protected]>
Date:   Tue Sep 16 16:23:19 2014 -0400

    qla_target: rearrange struct qla_tgt_prm
    
    On most (non-x86) 64bit platforms this will remove 8 padding bytes
    from the structure.
    
    Signed-off-by: Joern Engel <[email protected]>
    Signed-off-by: Nicholas Bellinger <[email protected]>

commit f9b6721a9cef94908467abf7a2cacbd15a7d23cb
Author: Joern Engel <[email protected]>
Date:   Tue Sep 16 16:23:18 2014 -0400

    qla_target: improve qlt_unmap_sg()
    
    Remove the inline attribute.  Modern compilers ignore it and the
    function has grown beyond where inline made sense anyway.
    Remove the BUG_ON(!cmd->sg_mapped), and instead return if sg_mapped is
    not set.  Every caller is doing this check, so we might as well have it
    in one place instead of four.
    
    Signed-off-by: Joern Engel <[email protected]>
    Signed-off-by: Nicholas Bellinger <[email protected]>

commit 55a9066fffd2f533e7ed434b072469ef09d6c476
Author: Joern Engel <[email protected]>
Date:   Tue Sep 16 16:23:15 2014 -0400

    qla_target: make some global functions static
    
    Also removes the declarations from the header - including two
    declarations without function definitions or callers.
    
    Signed-off-by: Joern Engel <[email protected]>
    Signed-off-by: Nicholas Bellinger <[email protected]>

commit c57010420654aca179c500f61e86315a337244ca
Author: Joern Engel <[email protected]>
Date:   Tue Sep 16 16:23:14 2014 -0400

    qla_target: remove unused parameter
    
    Signed-off-by: Joern Engel <[email protected]>
    Signed-off-by: Nicholas Bellinger <[email protected]>

commit f81ccb489a7a641c1bed41b49cf8d72c199c68d5
Author: Joern Engel <[email protected]>
Date:   Tue Sep 16 16:23:13 2014 -0400

    target: simplify core_tmr_abort_task
    
    list_for_each_entry_safe is necessary if list objects are deleted from
    the list while traversing it.  Not the case here, so we can use the base
    list_for_each_entry variant.
    
    Signed-off-by: Joern Engel <[email protected]>
    Signed-off-by: Nicholas Bellinger <[email protected]>

commit 33940d09937276cd3c81f2874faf43e37c2db0e2
Author: Joern Engel <[email protected]>
Date:   Tue Sep 16 16:23:12 2014 -0400

    target: encapsulate smp_mb__after_atomic()
    
    The target code has a rather generous helping of smp_mb__after_atomic()
    throughout the code base.  Most atomic operations were followed by one
    and none were preceded by smp_mb__before_atomic(), nor accompanied by a
    comment explaining the need for a barrier.
    
    Instead of trying to prove for every case whether or not it is needed,
    this patch introduces atomic_inc_mb() and atomic_dec_mb(), which
    explicitly include the memory barriers before and after the atomic
    operation.  For now they are defined in a target header, although they
    could be of general use.
    
    Most of the existing atomic/mb combinations were replaced by the new
    helpers.  In a few cases the atomic was sandwiched in
    spin_lock/spin_unlock and I simply removed the barrier.
    
    I suspect that in most cases the correct conversion would have been to
    drop the barrier.  I also suspect that a few cases exist where a) the
    barrier was necessary and b) a second barrier before the atomic would
    have been necessary and got added by this patch.
    
    Signed-off-by: Joern Engel <[email protected]>
    Signed-off-by: Nicholas Bellinger <[email protected]>

commit 74ed7e62289dc6d388996d7c8f89c2e7e95b9657
Author: Joern Engel <[email protected]>
Date:   Tue Sep 16 16:23:11 2014 -0400

    target: remove some smp_mb__after_atomic()s
    
    atomic_inc_return() already does an implicit memory barrier and the
    second case was moved from an atomic to a plain flag operation.  If a
    barrier were needed in the second case, it would have to be smp_mb(),
    not a variant optimized away for x86 and other architectures.
    
    Signed-off-by: Joern Engel <[email protected]>
    Signed-off-by: Nicholas Bellinger <[email protected]>

commit 8f83269048628d7b139dacbfc6cc97befcbdd2e9
Author: Joern Engel <[email protected]>
Date:   Tue Sep 16 16:23:10 2014 -0400

    target: simplify core_tmr_release_req()
    
    And while at it, do minimal coding style fixes in the area.
    
    Signed-off-by: Joern Engel <[email protected]>
    Signed-off-by: Nicholas Bellinger <[email protected]>

commit 9c7d6154bc4b9dfefd580490cdca5f7c72321464
Author: Andy Grover <[email protected]>
Date:   Mon Jun 30 16:39:46 2014 -0700

    target: Remove core_tpg_release_virtual_lun0 function
    
    Simple and just called from one place.
    
    Reviewed-by: Christoph Hellwig <[email protected]>
    Signed-off-by: Andy Grover <[email protected]>
    Signed-off-by: Nicholas Bellinger <[email protected]>

commit cd9d7cbaec8b622eee4edcd8bf481c4047f74915
Author: Andy Grover <[email protected]>
Date:   Mon Jun 30 16:39:44 2014 -0700

    target: Change core_dev_del_lun to take a se_lun instead of unpacked_lun
    
    Remove core_tpg_pre_dellun entirely, since we don't need to get/check
    a pointer we already have.
    
    Nothing else can return an error, so core_dev_del_lun can return void.
    
    Rename core_tpg_post_dellun to remove_lun - a clearer name, now that
    pre_dellun is gone.
    
    Reviewed-by: Christoph Hellwig <[email protected]>
    Signed-off-by: Andy Grover <[email protected]>
    Signed-off-by: Nicholas Bellinger <[email protected]>

commit cc83881f2c57caaf4b14adaffa65595640a59661
Author: Andy Grover <[email protected]>
Date:   Mon Jun 30 16:39:43 2014 -0700

    target: core_tpg_post_dellun can return void
    
    Nothing in it can raise an error.
    
    Reviewed-by: Christoph Hellwig <[email protected]>
    Signed-off-by: Andy Grover <[email protected]>
    Signed-off-by: Nicholas Bellinger <[email protected]>

commit 49be17235c0acd96f2ff0fe282867fe3a83f554c
Author: hayeswang <[email protected]>
Date:   Wed Oct 1 13:25:11 2014 +0800

    r8152: disable power cut for RTL8153
    
    The firmware would be clear when the power cut is enabled for
    RTL8153.
    
    Signed-off-by: Hayes Wang <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 204c8704128943bf3f8b605f4b40bdc2b6bd89dc
Author: hayeswang <[email protected]>
Date:   Wed Oct 1 13:25:10 2014 +0800

    r8152: remove clearing bp
    
    The xxx_clear_bp() is used to halt the firmware. It only necessary
    for updating the new firmware. Besides, depend on the version of
    the current firmware, it may have problem to halt the firmware
    directly. Finally, halt the firmware would let the firmware code
    useless, and the bugs which are fixed by the firmware would occur.
    
    Signed-off-by: Hayes Wang <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit aa55c8e2f7a395dfc9e67fc6637321e19ce9bfe1
Author: Masahiro Yamada <[email protected]>
Date:   Tue Sep 9 20:02:24 2014 +0900

    kbuild: handle C=... and M=... after entering into build directory
    
    This commit avoids processing C=... and M=... twice
    when O=... is also given.
    
    Besides, we can also remove KBUILD_EXTMOD="$(KBUILD_EXTMOD)"
    in the sub-make target.
    
    Signed-off-by: Masahiro Yamada <[email protected]>
    Acked-by: Peter Foley <[email protected]>
    Signed-off-by: Michal Marek <[email protected]>

commit 745a254322c898dadf019342cd7140f7867d2d0f
Author: Masahiro Yamada <[email protected]>
Date:   Tue Sep 9 20:02:23 2014 +0900

    kbuild: use $(Q) for sub-make target
    
    Since commit 066b7ed9558087a7957a1128f27d7a3462ff117f
    (kbuild: Do not print the build directory with make -s),
    "Q" is defined above the sub-make target.
    
    This commit takes advantage of that and replaces
    "$(if $(KBUILD_VERBOSE:1=),@)" with "$(Q)".
    
    Signed-off-by: Masahiro Yamada <[email protected]>
    Acked-by: Peter Foley <[email protected]>
    Signed-off-by: Michal Marek <[email protected]>

commit 7ff525712acf9325e9acdb27bbc93049ea2e850c
Author: Masahiro Yamada <[email protected]>
Date:   Tue Sep 9 20:02:22 2014 +0900

    kbuild: fake the "Entering directory ..." message more simply
    
    Commit c2e28dc975ea87feed84415006ae143424912ac7
    (kbuild: Print the name of the build directory)
    added a gimmick to show the "Entering directory ...".
    
    Instead of echoing the hard-coded message (that is, we need to know
    the exact message), moving --no-print-directory would be easier.
    
    Signed-off-by: Masahiro Yamada <[email protected]>
    Acked-by: Peter Foley <[email protected]>
    Signed-off-by: Michal Marek <[email protected]>

commit 1b0ecb28b0cc216535ce6477d39aa610c3ff68a1
Author: Vlad Yasevich <[email protected]>
Date:   Tue Sep 30 19:39:37 2014 -0400

    bnx2: Correctly receive full sized 802.1ad fragmes
    
    This driver, similar to tg3, has a check that will
    cause full sized 802.1ad frames to be dropped.  The
    frame will be larger then the standard mtu due to the
    presense of vlan header that has not been stripped.
    The driver should not drop this frame and should process
    it just like it does for 802.1q.
    
    CC: Sony Chacko <[email protected]>
    CC: [email protected]
    Signed-off-by: Vladislav Yasevich <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 7d3083ee36b51e425b6abd76778a2046906b0fd3
Author: Vlad Yasevich <[email protected]>
Date:   Tue Sep 30 19:39:36 2014 -0400

    tg3: Allow for recieve of full-size 8021AD frames
    
    When receiving a vlan-tagged frame that still contains
    a vlan header, the length of the packet will be greater
    then MTU+ETH_HLEN since it will account of the extra
    vlan header.  TG3 checks this for the case for 802.1Q,
    but not for 802.1ad.  As a result, full sized 802.1ad
    frames get dropped by the card.
    
    Add a check for 802.1ad protocol when receving full
    sized frames.
    
    Suggested-by: Prashant Sreedharan <[email protected]>
    CC: Prashant Sreedharan <[email protected]>
    CC: Michael Chan <[email protected]>
    Signed-off-by: Vladislav Yasevich <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 1e918876853aa85435e0f17fd8b4a92dcfff53d6
Author: Florian Westphal <[email protected]>
Date:   Wed Oct 1 13:38:03 2014 +0200

    r8169: add support for Byte Queue Limits
    
    tested on RTL8168d/8111d model using 'super_netperf 40' with TCP/UDP_STREAM.
    
    Output of
    while true; do
        for n in inflight limit; do
              echo -n $n\ ; cat $n;
        done;
        sleep 1;
    done
    
    during netperf run, 100mbit peer:
    
    inflight 0
    limit 3028
    inflight 6056
    limit 4542
    
    [ trimmed output for brevity, no limit/inflight changes during
      test steady-state ]
    
    limit 4542
    inflight 3028
    limit 6122
    inflight 0
    limit 6122
    [ changed cable to 1gbit peer, restart netperf ]
    inflight 37850
    limit 36336
    inflight 33308
    limit 31794
    inflight 33308
    limit 31794
    inflight 27252
    limit 25738
    [ again, no changes during test ]
    inflight 27252
    limit 25738
    inflight 0
    limit 28766
    [ change cable to 100mbit peer, restart netperf ]
    limit 28766
    inflight 27370
    limit 28766
    inflight 4542
    limit 5990
    inflight 6056
    limit 4542
    [ .. ]
    inflight 6056
    limit 4542
    inflight 0
    
    [end of test]
    
    Cc: Francois Romieu <[email protected]>
    Cc: Hayes Wang <[email protected]>
    Signed-off-by: Florian Westphal <[email protected]>
    Acked-by: Eric Dumazet <[email protected]>
    Acked-by: Tom Herbert <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit d0bf4a9e92b9a93ffeeacbd7b6cb83e0ee3dc2ef
Author: Eric Dumazet <[email protected]>
Date:   Mon Sep 29 13:29:15 2014 -0700

    net: cleanup and document skb fclone layout
    
    Lets use a proper structure to clearly document and implement
    skb fast clones.
    
    Then, we might experiment more easily alternative layouts.
    
    This patch adds a new skb_fclone_busy() helper, used by tcp and xfrm,
    to stop leaking of implementation details.
    
    Signed-off-by: Eric Dumazet <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 0f1ca65ee50df042051e8fa3a14f73b0c71d45b9
Author: Arianna Avanzini <[email protected]>
Date:   Fri Aug 22 13:20:02 2014 +0200

    xen, blkfront: factor out flush-related checks from do_blkif_request()
    
    This commit factors out some checks related to the request insertion
    path, which can be done in an function instead of by itself.
    
    Reviewed-by: David Vrabel <[email protected]>
    Signed-off-by: Arianna Avanzini <[email protected]>
    Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>

commit 61cecca865280bef4f8a9748d0a9afa5df351ac2
Author: Roger Pau Monné <[email protected]>
Date:   Mon Sep 15 11:55:27 2014 +0200

    xen-blkback: fix leak on grant map error path
    
    Fix leaking a page when a grant mapping has failed.
    
    CC: [email protected]
    Signed-off-by: Roger Pau Monné <[email protected]>
    Reported-and-Tested-by: Tao Chen <[email protected]>
    Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>

commit 12ea729645ace01e08f9654df155622898d3aae6
Author: Vitaly Kuznetsov <[email protected]>
Date:   Mon Sep 8 15:21:33 2014 +0200

    xen/blkback: unmap all persistent grants when frontend gets disconnected
    
    blkback does not unmap persistent grants when frontend goes to Closed
    state (e.g. when blkfront module is being removed). This leads to the
    following in guest's dmesg:
    
    [  343.243825] xen:grant_table: WARNING: g.e. 0x445 still in use!
    [  343.243825] xen:grant_table: WARNING: g.e. 0x42a still in use!
    ...
    
    When load module -> use device -> unload module sequence is performed multiple times
    it is possible to hit BUG() condition in blkfront module:
    
    [  343.243825] kernel BUG at drivers/block/xen-blkfront.c:954!
    [  343.243825] invalid opcode: 0000 [#1] SMP
    [  343.243825] Modules linked in: xen_blkfront(-) ata_generic pata_acpi [last unloaded: xen_blkfront]
    ...
    [  343.243825] Call Trace:
    [  343.243825]  [<ffffffff814111ef>] ? unregister_xenbus_watch+0x16f/0x1e0
    [  343.243825]  [<ffffffffa0016fbf>] blkfront_remove+0x3f/0x140 [xen_blkfront]
    ...
    [  343.243825] RIP  [<ffffffffa0016aae>] blkif_free+0x34e/0x360 [xen_blkfront]
    [  343.243825]  RSP <ffff88001eb8fdc0>
    
    We don't need to keep these grants if we're disconnecting as frontend might already
    forgot about them. Solve the issue by moving xen_blkbk_free_caches() call from
    xen_blkif_free() to xen_blkif_disconnect().
    
    Now we can see the following:
    [  928.590893] xen:grant_table: WARNING: g.e. 0x587 still in use!
    [  928.591861] xen:grant_table: WARNING: g.e. 0x372 still in use!
    ...
    [  929.592146] xen:grant_table: freeing g.e. 0x587
    [  929.597174] xen:grant_table: freeing g.e. 0x372
    ...
    
    Backend does not keep persistent grants any more, reconnect works fine.
    
    CC: [email protected]
    Signed-off-by: Vitaly Kuznetsov <[email protected]>
    Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>

commit b248230c34970a6c1c17c591d63b464e8d2cfc33
Author: Yuchung Cheng <[email protected]>
Date:   Mon Sep 29 13:20:38 2014 -0700

    tcp: abort orphan sockets stalling on zero window probes
    
    Currently we have two different policies for orphan sockets
    that repeatedly stall on zero window ACKs. If a socket gets
    a zero window ACK when it is transmitting data, the RTO is
    used to probe the window. The socket is aborted after roughly
    tcp_orphan_retries() retries (as in tcp_write_timeout()).
    
    But if the socket was idle when it received the zero window ACK,
    and later wants to send more data, we use the probe timer to
    probe the window. If the receiver always returns zero window ACKs,
    icsk_probes keeps getting reset in tcp_ack() and the orphan socket
    can stall forever until the system reaches the orphan limit (as
    commented in tcp_probe_timer()). This opens up a simple attack
    to create lots of hanging orphan sockets to burn the memory
    and the CPU, as demonstrated in the recent netdev post "TCP
    connection will hang in FIN_WAIT1 after closing if zero window is
    advertised." http://www.spinics.net/lists/netdev/msg296539.html
    
    This patch follows the design in RTO-based probe: we abort an orphan
    socket stalling on zero window when the probe timer reaches both
    the maximum backoff and the maximum RTO. For example, an 100ms RTT
    connection will timeout after roughly 153 seconds (0.3 + 0.6 +
    .... + 76.8) if the receiver keeps the window shut. If the orphan
    socket passes this check, but the system already has too many orphans
    (as in tcp_out_of_resources()), we still abort it but we'll also
    send an RST packet as the connection may still be active.
    
    In addition, we change TCP_USER_TIMEOUT to cover (life or dead)
    sockets stalled on zero-window probes. This changes the semantics
    of TCP_USER_TIMEOUT slightly because it previously only applies
    when the socket has pending transmission.
    
    Signed-off-by: Yuchung Cheng <[email protected]>
    Signed-off-by: Eric Dumazet <[email protected]>
    Signed-off-by: Neal Cardwell <[email protected]>
    Reported-by: Andrey Dmitrov <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 3edfe0030bb7a82dab2a30a29ea6e1800e600c4b
Author: Helge Deller <[email protected]>
Date:   Wed Oct 1 22:11:01 2014 +0200

    parisc: Fix serial console for machines with serial port on superio chip
    
    Fix the serial console on machines where the serial port is located on
    the SuperIO chip.
    
    Signed-off-by: Helge Deller <[email protected]>
    Cc: Peter Hurley <[email protected]>

commit baf378126b08474de2e2428b16e62a69df0339d9
Author: Michael Opdenacker <[email protected]>
Date:   Wed Oct 1 14:07:39 2014 -0600

    rsxx: Remove deprecated IRQF_DISABLED
    
    This removes the use of the IRQF_DISABLED flag
    from drivers/block/rsxx/core.c
    
    It's a NOOP since 2.6.35 and it will be removed one day.
    
    Signed-off-by: Michael Opdenacker <[email protected]>
    Acked-by Philip Kelleher <[email protected]>
    Signed-off-by: Jens Axboe <[email protected]>

commit cb57659a15c6c0576493cc8a10474ce7ffd44eb3
Author: Fabian Frederick <[email protected]>
Date:   Wed Oct 1 19:30:03 2014 +0200

    cipso: add __init to cipso_v4_cache_init
    
    cipso_v4_cache_init is only called by __init cipso_v4_init
    
    Signed-off-by: Fabian Frederick <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 57a02c39c1c20ed03a86f8014c11a8c18b94cac3
Author: Fabian Frederick <[email protected]>
Date:   Wed Oct 1 19:18:57 2014 +0200

    inet: frags: add __init to ip4_frags_ctl_register
    
    ip4_frags_ctl_register is only called by __init ipfrag_init
    
    Signed-off-by: Fabian Frederick <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 47d7a88c188f06ffaea3a539f84fe10cb4e77787
Author: Fabian Frederick <[email protected]>
Date:   Wed Oct 1 18:27:50 2014 +0200

    tcp: add __init to tcp_init_mem
    
    tcp_init_mem is only called by __init tcp_init.
    
    Signed-off-by: Fabian Frederick <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit ee7a1beb9759c94aea67dd887faf5e447a5c6710
Author: Chun-Hao Lin <[email protected]>
Date:   Wed Oct 1 23:17:21 2014 +0800

    r8169:call "rtl8168_driver_start" "rtl8168_driver_stop" only when hardware dash function is enabled
    
    These two functions are used to inform dash firmware that driver is been
    brought up or brought down. So call these two functions only when hardware dash
    function is enabled.
    
    Signed-off-by: Chun-Hao Lin <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 2a9b4d9670e71784896d95c41c9b0acd50db1dbb
Author: Chun-Hao Lin <[email protected]>
Date:   Wed Oct 1 23:17:20 2014 +0800

    r8169:modify the behavior of function "rtl8168_oob_notify"
    
    In function "rtl8168_oob_notify", using function "rtl_eri_write" to access
    eri register 0xe8, instead of using MAC register "ERIDR" and "ERIAR" to
    access it.
    
    For using function "rtl_eri_write" in function "rtl8168_oob_notify", need to
    move down "rtl8168_oob_notify" related functions under the function
    "rtl_eri_write".
    
    Signed-off-by: Chun-Hao Lin <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 2f8c040ce6791ef0477e6d59768ee3d5fd0df0fd
Author: Chun-Hao Lin <[email protected]>
Date:   Wed Oct 1 23:17:19 2014 +0800

    r8169:change the name of function "r8168dp_check_dash" to "r8168_check_dash"
    
    DASH function not only RTL8168DP can support, but also RTL8168EP.
    So change the name of function "r8168dp_check_dash" to "r8168_check_dash".
    
    Signed-off-by: Chun-Hao Lin <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 706123d06c18b55da5e9da21e2d138ee789bf8f4
Author: Chun-Hao Lin <[email protected]>
Date:   Wed Oct 1 23:17:18 2014 +0800

    r8169:change the name of function"rtl_w1w0_eri"
    
    Change the name of function "rtl_w1w0_eri" to "rtl_w0w1_eri".
    
    In this function, the local variable "val" is "write zeros then write ones".
    Please see below code.
    
    (val & ~m) | p
    
    In this patch, change the function name from "xx_w1w0_xx" to "xx_w0w1_xx".
    The changed function name is more suitable for it's behavior.
    
    Signed-off-by: Chun-Hao Lin <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 7656442824f6174b56a19c664fe560972df56ad4
Author: Chun-Hao Lin <[email protected]>
Date:   Wed Oct 1 23:17:17 2014 +0800

    r8169:for function "rtl_w1w0_phy" change its name and behavior
    
    Change function name from "rtl_w1w0_phy" to "rtl_w0w1_phy".
    And its behavior from "write ones then write zeros" to
    "write zeros then write ones".
    
    In Realtek internal driver, bitwise operations are almost "write zeros then
    write ones". For easy to port hardware parameters from Realtek internal driver
    to Linux kernal driver "r8169", we would like to change this function's
    behavior and its name.
    
    Signed-off-by: Chun-Hao Lin <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit ac85bcdbc0ffd3903d6db4abcd769ecacf98605b
Author: Chun-Hao Lin <[email protected]>
Date:   Wed Oct 1 23:17:16 2014 +0800

    r8169:add more chips to support magic packet v2
    
    For RTL8168F RTL8168FB RTL8168G RTL8168GU RTL8411 RTL8411B RTL8402 RTL8107E,
    the magic packet enable bit is changed to eri 0xde bit0.
    
    In this patch, change magic packet enable bit of these chips to eri 0xde bit0.
    
    Signed-off-by: Chun-Hao Lin <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 89cceb2729c752e6ff9b3bc8650a70f29884f116
Author: Chun-Hao Lin <[email protected]>
Date:   Wed Oct 1 23:17:15 2014 +0800

    r8169:add support more chips to get mac address from backup mac address register
    
    RTL8168FB RTL8168G RTL8168GU RTL8411 RTL8411B RTL8106EUS RTL8402 can
    support get mac address from backup mac address register.
    
    Signed-off-by: Chun-Hao Lin <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 42fde7371035144037844f41bd16950de9912bdb
Author: Chun-Hao Lin <[email protected]>
Date:   Wed Oct 1 23:17:14 2014 +0800

    r8169:add disable/enable RTL8411B pll function
    
    RTL8411B can support disable/enable pll function.
    
    Signed-off-by: Chun-Hao Lin <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit b8e5e6ad7115befef13a4493f1d2b8e438abc058
Author: Chun-Hao Lin <[email protected]>
Date:   Wed Oct 1 23:17:13 2014 +0800

    r8169:add disable/enable RTL8168G pll function
    
    RTL8168G also can disable/enable pll function.
    
    Signed-off-by: Chun-Hao Lin <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 05b9687bb3606190304f08c2e4cd63de8717e30b
Author: Chun-Hao Lin <[email protected]>
Date:   Wed Oct 1 23:17:12 2014 +0800

    r8169:change uppercase number to lowercase number
    
    Signed-off-by: Chun-Hao Lin <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit a29c9c43bb633a9965909cd548879fee4aa789a4
Author: David L Stevens <[email protected]>
Date:   Wed Oct 1 11:05:27 2014 -0400

    sunvnet: fix potential NULL pointer dereference
    
    One of the error cases for vnet_start_xmit()'s "out_dropped" label
    is port == NULL, so only mess with port->clean_timer when port is not NULL.
    
    Signed-off-by: David L Stevens <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit e506d405ac7d34d03996c97ac68aa2ac010be64a
Author: Thierry Reding <[email protected]>
Date:   Wed Oct 1 13:59:00 2014 +0200

    net: dsa: Fix build warning for !PM_SLEEP
    
    The dsa_switch_suspend() and dsa_switch_resume() functions are only used
    when PM_SLEEP is enabled, so they need #ifdef CONFIG_PM_SLEEP protection
    to avoid a compiler warning.
    
    Signed-off-by: Thierry Reding <[email protected]>
    Acked-by: Florian Fainelli <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 84ac1f2ca41f5888cc995944c073a5220f3ed549
Author: Tanmay Inamdar <[email protected]>
Date:   Fri Sep 26 14:08:25 2014 -0700

    arm64: dts: Add APM X-Gene PCIe device tree nodes
    
    Add the device tree nodes for APM X-Gene PCIe host controller and PCIe
    clock interface.  Since X-Gene SOC supports maximum 5 ports, 5 dts nodes
    are added.
    
    Signed-off-by: Tanmay Inamdar <[email protected]>
    Signed-off-by: Bjorn Helgaas <[email protected]>

commit 2896e4418b17363f211e084471b589e3c06a7248
Author: Bjorn Helgaas <[email protected]>
Date:   Wed Oct 1 13:01:35 2014 -0600

    PCI: xgene: Add APM X-Gene PCIe driver
    
    Add the AppliedMicro X-Gene SOC PCIe host controller driver.  The X-Gene
    PCIe controller supports up to 8 lanes and GEN3 speed.  The X-Gene SOC
    supports up to 5 PCIe ports.
    
    [bhelgaas: folded in MAINTAINERS and bindings updates]
    Tested-by: Ming Lei <[email protected]>
    Tested-by: Dann Frazier <[email protected]>
    Signed-off-by: Tanmay Inamdar <[email protected]>
    Signed-off-by: Bjorn Helgaas <[email protected]>
    Reviewed-by: Liviu Dudau <[email protected]> (driver)

commit 3c87dcbfb36ce6d3d9087f0163c02ba5690d9a85
Author: Subbaraya Sundeep Bhatta <[email protected]>
Date:   Wed Oct 1 11:01:17 2014 +0200

    net: ll_temac: Remove unnecessary ether_setup after alloc_etherdev
    
    Calling ether_setup is redundant since alloc_etherdev calls it.
    
    Signed-off-by: Subbaraya Sundeep Bhatta <[email protected]>
    Signed-off-by: Michal Simek <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 8493ecca74a7b4a66e19676de1a0f14194179941
Author: Benjamin Tissoires <[email protected]>
Date:   Wed Oct 1 11:59:47 2014 -0400

    HID: uHID: fix excepted report type
    
    When uhid_get_report() or uhid_set_report() are called, they emit on the
    char device a UHID_GET_REPORT or UHID_SET_REPORT message. Then, the
    protocol says that the user space asnwers with UHID_GET_REPORT_REPLY
    or UHID_SET_REPORT_REPLY.
    
    Unfortunatelly, the current code waits for an event of type UHID_GET_REPORT
    or UHID_SET_REPORT instead of the reply one.
    Add 1 to UHID_GET_REPORT or UHID_SET_REPORT to actually wait for the
    reply, and validate the reply.
    
    Signed-off-by: Benjamin Tissoires <[email protected]>
    Reviewed-by: David Herrmann <[email protected]>
    Signed-off-by: Jiri Kosina <[email protected]>

commit c8df6ac9452e8f47a6f660993c526d13e858a6f3
Author: Lucas Stach <[email protected]>
Date:   Tue Sep 30 18:36:27 2014 +0200

    PCI: designware: Remove open-coded bitmap operations
    
    Replace them by using the standard kernel bitmap ops.  No functional
    change, but makes the code a lot cleaner.
    
    Signed-off-by: Lucas Stach <[email protected]>
    Signed-off-by: Bjorn Helgaas <[email protected]>
    Reviewed-by: Pratyush Anand <[email protected]>
    Acked-by: Jingoo Han <[email protected]>

commit 2199f0608864cf4e8c93d37842a5ee50c8d79843
Author: Mikulas Patocka <[email protected]>
Date:   Fri Mar 28 15:51:56 2014 -0400

    dm crypt: sort writes
    
    Write requests are sorted in a red-black tree structure and are
    submitted in the sorted order.
    
    In theory the sorting should be performed by the underlying disk
    scheduler, however, in practice the disk scheduler only accepts and
    sorts a finite number of requests.  To allow the sorting of all
    requests, dm-crypt needs to implement its own own sorting.
    
    The overhead associated with rbtree-based sorting is considered
    negligible so it is not used conditionally.  Even on SSD sorting can be
    beneficial since in-order request dispatch promotes lower latency IO
    completion to the upper layers.
    
    Signed-off-by: Mikulas Patocka <[email protected]>
    Signed-off-by: Mike Snitzer <[email protected]>

commit 648fee35be4c75667aa18bf513f7e7e65c01640b
Author: Mikulas Patocka <[email protected]>
Date:   Fri Mar 28 15:51:56 2014 -0400

    dm crypt: offload writes to thread
    
    Submitting write bios directly in the encryption thread caused serious
    performance degradation.  On a multiprocessor machine, encryption requests
    finish in a different order than they were submitted.  Consequently, write
    requests would be submitted in a different order and it could cause severe
    performance degradation.
    
    Move the submission of write requests to a separate thread so that the
    requests can be sorted before submitting.  But this commit improves
    dm-crypt performance even without having dm-crypt perform request
    sorting (in particular it enables IO schedulers like CFQ to sort more
    effectively).
    
    Note: it is required that a previous commit ("dm crypt: don't allocate
    pages for a partial request") be applied before applying this patch.
    Otherwise, this commit could introduce a crash.
    
    Signed-off-by: Mikulas Patocka <[email protected]>
    Signed-off-by: Mike Snitzer <[email protected]>

commit 4a0d7e0464226eee625a5b77484c339334453882
Author: Mikulas Patocka <[email protected]>
Date:   Fri Mar 28 15:51:55 2014 -0400

    dm crypt: use unbound workqueue for request processing
    
    Use unbound workqueue so that work is automatically balanced between
    available CPUs.
    
    Signed-off-by: Mikulas Patocka <[email protected]>
    Signed-off-by: Mike Snitzer <[email protected]>

commit 72bfc40ca3b393cb0bc6b5e2ce364e6c6ce0f390
Author: Mikulas Patocka <[email protected]>
Date:   Thu May 29 14:18:12 2014 -0400

    dm crypt: remove io_pending refcount member from dm_crypt_io
    
    Commit "dm crypt: don't allocate pages for a partial request" changed
    the code to allocate all pages for one request.  There is always just
    one pending request, so the io_pending refcount may be removed.
    
    Signed-off-by: Mikulas Patocka <[email protected]>
    Signed-off-by: Mike Snitzer <[email protected]>

commit 42196fec8945cc84c032b7f59deaffee82036245
Author: Mikulas Patocka <[email protected]>
Date:   Fri Mar 28 15:51:56 2014 -0400

    dm crypt: remove unused io_pool and _crypt_io_pool
    
    The previous commits ("dm crypt: use per-bio data") and ("dm crypt:
    don't allocate pages for a partial request") stopped using the
    io_pool slab mempool and backing _crypt_io_pool kmem cache.
    
    Signed-off-by: Mikulas Patocka <[email protected]>
    Signed-off-by: Mike Snitzer <[email protected]>

commit ebfda24b1e1bf483accdb900f8625151d8f01383
Author: Mikulas Patocka <[email protected]>
Date:   Fri Mar 28 15:51:56 2014 -0400

    dm crypt: avoid deadlock in mempools
    
    Fix a theoretical deadlock introduced in the previous commit ("dm crypt:
    don't allocate pages for a partial request").
    
    The function crypt_alloc_buffer may be called concurrently.  If we allocate
    from the mempool concurrently, there is a possibility of deadlock.  For
    example, if we have mempool of 256 pages, two processes, each wanting
    256, pages allocate from the mempool concurrently, it may deadlock in a
    situation where both processes have allocated 128 pages and the mempool
    is exhausted.
    
    In order to avoid such a scenario, we allocate the pages under a mutex.
    
    In order to not degrade performance with excessive locking, we try
    non-blocking allocations without a mutex first and if it fails, we
    fallback to a blocking allocation with a mutex.
    
    Signed-off-by: Mikulas Patocka <[email protected]>
    Signed-off-by: Mike Snitzer <[email protected]>

commit b9ea7cb3fb237078be400522880932008c630fb7
Author: Mikulas Patocka <[email protected]>
Date:   Fri Mar 28 15:51:56 2014 -0400

    dm crypt: don't allocate pages for a partial request
    
    Change crypt_alloc_buffer so that it only ever allocates pages for a
    full request.
    
    This change is a prerequisite for the commit "dm crypt: offload writes
    to thread".  Which implies this change is effectively required for the
    upcoming cpu parallelization changes.
    
    But this change simplifies the dm-crypt code at the expense of reduced
    throughput in low memory conditions (where allocation for a partial
    request is most useful).
    
    This change also enables the removal of the io_pending refcount.
    
    Note: the next commit ("dm-crypt: avoid deadlock in mempools") is needed
    to fix a theoretical deadlock.
    
    Signed-off-by: Mikulas Patocka <[email protected]>
    Signed-off-by: Mike Snitzer <[email protected]>

commit 117cd3e12232afea97dd31489fbde8888ad22b3e
Author: Heinz Mauelshagen <[email protected]>
Date:   Wed Sep 24 17:47:19 2014 +0200

    dm raid: add discard support for RAID levels 4, 5 and 6
    
    In case of RAID levels 4, 5 and 6 we have to verify each RAID members'
    ability to zero data on discards to avoid stripe data corruption -- if
    discard_zeroes_data is not set for each RAID member discard support must
    be disabled.
    
    Also add an 'ignore_discard' table argument to the target in order to
    ignore discard processing completely on a RAID array, hence not passing
    down discards to MD personalities.
    
    This 'ignore_discard' control provides the ability to:
    - prohibit discards in case of _potential_ data corruptions in RAID4/5/6
      (e.g. if ability to zero data on discard is flawed in a RAID member)
    - avoid discard processing overhead
    
    Signed-off-by: Heinz Mauelshagen <[email protected]>
    Signed-off-by: Mike Snitzer <[email protected]>

commit 04c308f43a90a9b3b84c344b324d6af29288da05
Author: Mikulas Patocka <[email protected]>
Date:   Wed Oct 1 13:29:48 2014 -0400

    dm bufio: when done scanning return from __scan immediately
    
    When __scan frees the required number of buffer entries that the
    shrinker requested (nr_to_scan becomes zero) it must return.  Before
    this fix the __scan code exited only the inner loop and continued in the
    outer loop.
    
    Also, move dm_bufio_cond_resched to __scan's inner loop, so that
    iterating the bufio client's lru lists doesn't result in scheduling
    latency.
    
    Reported-by: Joe Thornber <[email protected]>
    Signed-off-by: Mikulas Patocka <[email protected]>
    Signed-off-by: Mike Snitzer <[email protected]>
    Cc: [email protected] # 3.2+

commit 5ec094057c7df5ff80f5e7fe282f47ad205fb976
Author: Bjorn Helgaas <[email protected]>
Date:   Tue Sep 23 14:38:28 2014 -0600

    PCI/MSI: Remove unnecessary temporary variable
    
    The only use of "status" is to hold a value which is immediately returned,
    so just return and remove the variable directly.
    
    Signed-off-by: Bjorn Helgaas <[email protected]>

commit 56b72b40957947f7c08771f030102351d4c906df
Author: Yijing Wang <[email protected]>
Date:   Mon Sep 29 18:35:16 2014 -0600

    PCI/MSI: Use __write_msi_msg() instead of write_msi_msg()
    
    default_restore_msi_irq() already has the struct msi_desc pointer required
    by __write_msi_msg(), so call it directly instead of having write_msi_msg()
    look it up from the IRQ.
    
    No functional change.
    
    [bhelgaas: split into separate patch]
    Signed-off-by: Yijing Wang <[email protected]>
    Signed-off-by: Bjorn Helgaas <[email protected]>

commit 1e8f4cc82eded0c3c97ef6e2f119782e42deda35
Author: Yijing Wang <[email protected]>
Date:   Wed Sep 24 11:09:45 2014 +0800

    MSI/powerpc: Use __read_msi_msg() instead of read_msi_msg()
    
    rtas_setup_msi_irqs() already has the struct msi_desc pointer required by
    __read_msi_msg(), so call it directly instead of having read_msi_msg() look
    it up from the IRQ.
    
    No functional change.
    
    [bhelgaas: changelog]
    Signed-off-by: Yijing Wang <[email protected]>
    Signed-off-by: Bjorn Helgaas <[email protected]>
    Acked-by: Michael Ellerman <[email protected]>
    CC: Benjamin Herrenschmidt <[email protected]>
    CC: [email protected]

commit 2b260085e466c345e78f23b1c9ad1d123d509ef8
Author: Yijing Wang <[email protected]>
Date:   Tue Sep 23 13:27:25 2014 +0800

    PCI/MSI: Use __get_cached_msi_msg() instead of get_cached_msi_msg()
    
    Both callers of get_cached_msi_msg() start with a struct irq_data pointer,
    look up the corresponding IRQ number, and pass it to get_cached_msi_msg(),
    which then uses irq_get_irq_data() to look up the struct irq_data again to
    call __get_cached_msi_msg().
    
    Since we already have the struct irq_data, call __get_cached_msi_msg()
    directly and skip the lookup work done by get_cached_msi_msg().
    
    No functional change.
    
    [bhelgaas: changelog]
    Signed-off-by: Yijing Wang <[email protected]>
    Signed-off-by: Bjorn Helgaas <[email protected]>
    CC: Tony Luck <[email protected]>
    CC: [email protected]

commit 468ff15a3ab98ed7153c29c68229ffb97f15a251
Author: Yijing Wang <[email protected]>
Date:   Tue Sep 23 13:27:24 2014 +0800

    PCI/MSI: Add "msi_bus" sysfs MSI/MSI-X control for endpoints
    
    The "msi_bus" sysfs file for bridges sets a bus flag to allow or disallow
    future driver requests for MSI or MSI-X.  Previously, the sysfs file
    existed for endpoints but did nothing.
    
    Add "msi_bus" support for endpoints, so an administrator can prevent the
    use of MSI and MSI-X for individual devices.
    
    Note that as for bridges, these changes only affect future driver requests
    for MSI or MSI-X, so drivers may need to be reloaded.
    
    Add documentation for the "msi_bus" sysfs file.
    
    [bhelgaas: changelog, comments, add "subordinate", add endpoint printk,
    rework bus_flags setting, make bus_flags printk unconditional]
    Signed-off-by: Yijing Wang <[email protected]>
    Signed-off-by: Bjorn Helgaas <[email protected]>

commit 48c3c38f003c25d50a09d3da558667c5ecd530aa
Author: Yijing Wang <[email protected]>
Date:   Tue Sep 23 11:02:42 2014 -0600

    PCI/MSI: Remove "pos" from the struct msi_desc msi_attrib
    
    "msi_attrib.pos" is only used for MSI (not MSI-X), and we already cache the
    MSI capability offset in "dev->msi_cap".
    
    Remove "pos" from the struct msi_attrib and use "dev->msi_cap" directly.
    
    [bhelgaas: changelog, fix whitespace]
    Signed-off-by: Yijing Wang <[email protected]>
    Signed-off-by: Bjorn Helgaas <[email protected]>

commit 81052769e48609525c452d8f078a5786b673e178
Author: Yijing Wang <[email protected]>
Date:   Tue Sep 23 13:27:22 2014 +0800

    PCI/MSI: Remove unused kobject from struct msi_desc
    
    After commit 1c51b50c2995 ("PCI/MSI: Export MSI mode using attributes, not
    kobjects"), the kobject in struct msi_desc is unused.
    
    Remove the unused struct kobject from struct msi_desc.
    
    [bhelgaas: changelog]
    Fixes: 1c51b50c2995 ("PCI/MSI: Export MSI mode using attributes, not kobjects")
    Signed-off-by: Yijing Wang <[email protected]>
    Signed-off-by: Bjorn Helgaas <[email protected]>
    Acked-by: Greg Kroah-Hartman <[email protected]>

commit a06cd74cefe754341f747ddc4cf7b0058fa9bff8
Author: Alexander Gordeev <[email protected]>
Date:   Tue Sep 23 12:45:58 2014 -0600

    PCI/MSI: Rename pci_msi_check_device() to pci_msi_supported()
    
    Rename pci_msi_check_device() to pci_msi_supported() for clarity.  Note
    that pci_msi_supported() returns true if MSI/MSI-X is supported, so code
    like:
    
      if (pci_msi_supported(...))
    
    reads naturally.
    
    [bhelgaas: changelog, split to separate patch, reverse sense]
    Signed-off-by: Alexander Gordeev <[email protected]>
    Signed-off-by: Bjorn Helgaas <[email protected]>

commit 27e20603c54ba633ed259284d006275f13c9f95b
Author: Alexander Gordeev <[email protected]>
Date:   Tue Sep 23 14:25:11 2014 -0600

    PCI/MSI: Move D0 check into pci_msi_check_device()
    
    Both callers of pci_msi_check_device() check that the device is in D0
    state, so move the check from the callers into pci_msi_check_device()
    itself.
    
    In pci_enable_msi_range(), note that pci_msi_check_device() never returns a
    positive value any more, so the loop that called it until it returns zero
    or negative is no longer necessary.
    
    [bhelgaas: changelog, split to separate patch]
    Signed-off-by: Alexander Gordeev <[email protected]>
    Signed-off-by: Bjorn Helgaas <[email protected]>

commit ad975ebad4c3ce8dcc7d0bb4db26ea5aca4cfc99
Author: Alexander Gordeev <[email protected]>
Date:   Tue Sep 23 12:39:54 2014 -0600

    PCI/MSI: Remove arch_msi_check_device()
    
    No architectures implement arch_msi_check_device() or the struct msi_chip
    .check_device() method, so remove them.
    
    Remove the "type" parameter to pci_msi_check_device() because it was only
    used to call arch_msi_check_device() and is no longer needed.
    
    [bhelgaas: changelog, split to separate patch]
    Signed-off-by: Alexander Gordeev <[email protected]>
    Signed-off-by: Bjorn Helgaas <[email protected]>

commit 3930115e0dd67f61b3b1882c7a34d0baeff1bb4c
Author: Alexander Gordeev <[email protected]>
Date:   Sun Sep 7 20:57:54 2014 +0200

    irqchip: armada-370-xp: Remove arch_msi_check_device()
    
    Move MSI checks from arch_msi_check_device() to arch_setup_msi_irqs().
    This makes the code more compact and allows removing
    arch_msi_check_device() from generic MSI code.
    
    Tested-by: Thomas Petazzoni <[email protected]>
    Signed-off-by: Alexander Gordeev <[email protected]>
    Signed-off-by: Bjorn Helgaas <[email protected]>
    Acked-by: Jason Cooper <[email protected]>
    CC: Thomas Gleixner <[email protected]>

commit 6b2fd7efeb888fa781c1f767de6c36497ac1596b
Author: Alexander Gordeev <[email protected]>
Date:   Sun Sep 7 20:57:53 2014 +0200

    PCI/MSI/PPC: Remove arch_msi_check_device()
    
    Move MSI checks from arch_msi_check_device() to arch_setup_msi_irqs().
    This makes the code more compact and allows removing
    arch_msi_check_device() from generic MSI code.
    
    Signed-off-by: Alexander Gordeev <[email protected]>
    Signed-off-by: Bjorn Helgaas <[email protected]>
    Acked-by: Michael Ellerman <[email protected]>

commit 977104ece1568f2e2ad3f5fd8e55bd640e8ab55a
Author: Mark Charlebois <[email protected]>
Date:   Thu Sep 4 14:16:17 2014 -0700

    arm: LLVMLinux: Use global stack register variable for percpu
    
    Using global current_stack_pointer works on both clang and gcc.
    current_stack_pointer is an unsigned long and needs to be cast
    as a pointer to dereference.
    
    KernelVersion: 3.17.0-rc6
    Signed-off-by: Mark Charlebois <[email protected]>
    Signed-off-by: Behan Webster <[email protected]>

commit a35dc594542b29935cd3a92e53233ad4ba4e622f
Author: Behan Webster <[email protected]>
Date:   Tue Sep 3 22:27:27 2013 -0400

    arm: LLVMLinux: Use current_stack_pointer in unwind_backtrace
    
    Use the global current_stack_pointer to get the value of the stack pointer.
    This change supports being able to compile the kernel with both gcc and clang.
    
    KernelVersion: 3.17.0-rc6
    Signed-off-by: Behan Webster <[email protected]>
    Reviewed-by: Mark Charlebois <[email protected]>
    Reviewed-by: Jan-Simon Möller <[email protected]>
    Acked-by: Will Deacon <[email protected]>
    Acked-by: Nicolas Pitre <[email protected]>

commit 5c5da6724d8e1767405a3f4b611451a11ece99e2
Author: Behan Webster <[email protected]>
Date:   Tue Sep 3 22:27:27 2013 -0400

    arm: LLVMLinux: Calculate current_thread_info from current_stack_pointer
    
    Use the global current_stack_pointer to get the value of the stack pointer.
    This change supports being able to compile the kernel with both gcc and clang.
    
    KernelVersion: 3.17.0-rc6
    Signed-off-by: Behan Webster <[email protected]>
    Reviewed-by: Mark Charlebois <[email protected]>
    Reviewed-by: Jan-Simon Möller <[email protected]>
    Acked-by: Will Deacon <[email protected]>
    Acked-by: Nicolas Pitre <[email protected]>

commit f2b6d8c6c56c9a164a2d885ba34a09d613c959c9
Author: Behan Webster <[email protected]>
Date:   Tue Sep 3 22:27:27 2013 -0400

    arm: LLVMLinux: Use current_stack_pointer in save_stack_trace_tsk
    
    Use the global current_stack_pointer to get the value of the stack pointer.
    This change supports being able to compile the kernel with both gcc and clang.
    
    KernelVersion: 3.17.0-rc6
    Signed-off-by: Behan Webster <[email protected]>
    Reviewed-by: Mark Charlebois <[email protected]>
    Reviewed-by: Jan-Simon Möller <[email protected]>
    Acked-by: Will Deacon <[email protected]>
    Acked-by: Nicolas Pitre <[email protected]>

commit 40802b84566a3d9731a8fea43b144301d9ac450d
Author: Behan Webster <[email protected]>
Date:   Tue Sep 3 22:27:27 2013 -0400

    arm: LLVMLinux: Use current_stack_pointer for return_address
    
    Use the global current_stack_pointer to get the value of the stack pointer.
    This change supports being able to compile the kernel with both gcc and Clang.
    
    KernelVersion: 3.17.0-rc6
    Signed-off-by: Behan Webster <[email protected]>
    Reviewed-by: Mark Charlebois <[email protected]>
    Reviewed-by: Jan-Simon Möller <[email protected]>
    Acked-by: Will Deacon <[email protected]>
    Acked-by: Nicolas Pitre <[email protected]>

commit d80ced5236764b8c4ffda5545d5b357cf88c77c1
Author: Behan Webster <[email protected]>
Date:   Tue Sep 3 22:27:27 2013 -0400

    arm: LLVMLinux: Use current_stack_pointer to calculate pt_regs address
    
    Use the global current_stack_pointer to calculate the end of the stack for
    current_pt_regs()
    
    KernelVersion: 3.17.0-rc6
    Signed-off-by: Behan Webster <[email protected]>
    Reviewed-by: Mark Charlebois <[email protected]>
    Reviewed-by: Jan-Simon Möller <[email protected]>
    Acked-by: Will Deacon <[email protected]>
    Acked-by: Nicolas Pitre <[email protected]>

commit 9d0d6994806b36891453beb1e94b6253f853af61
Author: Behan Webster <[email protected]>
Date:   Tue Sep 3 22:27:26 2013 -0400

    arm: LLVMLinux: Add global named register current_stack_pointer for ARM
    
    Define a global named register for current_stack_pointer. The use of this new
    variable guarantees that both gcc and clang can access this register in C code.
    
    KernelVersion: 3.17.0-rc6
    Signed-off-by: Behan Webster <[email protected]>
    Reviewed-by: Jan-Simon Möller <[email protected]>
    Reviewed-by: Mark Charlebois <[email protected]>
    Acked-by: Will Deacon <[email protected]>
    Acked-by: Nicolas Pitre <[email protected]>

commit 2c804d0f8fc7799981d9fdd8c88653541b28c1a7
Author: Eric Dumazet <[email protected]>
Date:   Tue Sep 30 22:12:05 2014 -0700

    ipv4: mentions skb_gro_postpull_rcsum() in inet_gro_receive()
    
    Proper CHECKSUM_COMPLETE support needs to adjust skb->csum
    when we remove one header. Its done using skb_gro_postpull_rcsum()
    
    In the case of IPv4, we know that the adjustment is not really needed,
    because the checksum over IPv4 header is 0. Lets add a comment to
    ease code comprehension and avoid copy/paste errors.
    
    Signed-off-by: Eric Dumazet <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit eb51bbaf8dedf142a54a7ff58514a29b40d515bb
Author: Stephen Rothwell <[email protected]>
Date:   Wed Oct 1 17:00:49 2014 +1000

    fm10k: using vmalloc requires including linux/vmalloc.h
    
    Signed-off-by: Stephen Rothwell <[email protected]>
    Acked-by: Jeff Kirsher <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 078efae00ffc76381c3248006e9cf0988163488f
Author: Anish Bhatt <[email protected]>
Date:   Mon Sep 15 17:44:18 2014 -0700

    [SCSI] cxgb4i: avoid holding mutex in interrupt context
    
    cxgbi_inet6addr_handler() can be called in interrupt context, so use rcu
    protected list while finding netdev.  This is observed as a scheduling in
    atomic oops when running over ipv6.
    
    Fixes: fc8d0590d914 ("libcxgbi: Add ipv6 api to driver")
    Fixes: 759a0cc5a3e1 ("cxgb4i: Add ipv6 code to driver, call into libcxgbi ipv6 api")
    
    Signed-off-by: Anish Bhatt <[email protected]>
    Signed-off-by: Karen Xie <[email protected]>
    Signed-off-by: Christoph Hellwig <[email protected]>
    Signed-off-by: James Bottomley <[email protected]>

commit 34549ab09e62db9703811c6ed4715f2ffa1fd7fb
Author: Jeff Layton <[email protected]>
Date:   Wed Oct 1 08:05:22 2014 -0400

    nfsd: eliminate "to_delegation" define
    
    We now have cb_to_delegation and to_delegation, which do the same thing
    and are defined separately in different .c files. Move the
    cb_to_delegation definition into a header file and eliminate the
    redundant to_delegation definition.
    
    Reviewed-by: Christoph Hellwig <[email protected]>
    Signed-off-by: Jeff Layton <[email protected]>

commit 4a0efdc933680d908de11712a774a2c9492c3d5a
Author: Hannes Reinecke <[email protected]>
Date:   Wed Oct 1 14:32:31 2014 +0200

    block: misplaced rq_complete tracepoint
    
    The rq_complete tracepoint was never issued for empty requests,
    causing the resulting blktrace information to never show any
    completion for those request.
    
    Signed-off-by: Hannes Reinecke <[email protected]>
    Acked-by: Tejun Heo <[email protected]>
    Signed-off-by: Jens Axboe <[email protected]>

commit fc2021fb9baf9ed375c8161b40b68e120e75c60e
Author: Michael Opdenacker <[email protected]>
Date:   Wed Oct 1 12:07:07 2014 +0200

    block: hd: remove deprecated IRQF_DISABLED
    
    This patch removes the use of the IRQF_DISABLED flag
    from drivers/block/hd.c
    
    It's a NOOP since 2.6.35 and it will be removed one day.
    
    This also removes a related comment which is obsolete too.
    
    Signed-off-by: Michael Opdenacker <[email protected]>
    Signed-off-by: Jens Axboe <[email protected]>

commit 19aeb5a65f1a6504fc665466c188241e7393d66f
Author: Bob Peterson <[email protected]>
Date:   Mon Sep 29 08:52:04 2014 -0400

    GFS2: Make rename not save dirent location
    
    This patch fixes a regression in the patch "GFS2: Remember directory
    insert point", commit 2b47dad866d04f14c328f888ba5406057b8c7d33.
    The problem had to do with the rename function: The function found
    space for the new dirent, and remembered that location. But then the
    old dirent was removed, which often moved the eligible location for
    the renamed dirent. Putting the new dirent at the saved location
    caused file system corruption.
    
    This patch adds a new "save_loc" variable to struct gfs2_diradd.
    If 1, the dirent location is saved. If 0, the dirent location is not
    saved and the buffer_head is released as per previous behavior.
    
    Signed-off-by: Bob Peterson <[email protected]>
    Signed-off-by: Steven Whitehouse <[email protected]>

commit 5235166fbc332c8b5dcf49e3a498a8b510a77449
Author: Oliver Neukum <[email protected]>
Date:   Tue Sep 30 12:54:56 2014 +0200

    HID: usbhid: add another mouse that needs QUIRK_ALWAYS_POLL
    
    There is a second mouse sharing the same vendor strings but different IDs.
    
    Signed-off-by: Oliver Neukum <[email protected]>
    Signed-off-by: Jiri Kosina <[email protected]>

commit 2013add4ce73c93ae2148969a9ec3ecc8b1e26fa
Author: Gavin Shan <[email protected]>
Date:   Wed Oct 1 14:34:51 2014 +1000

    powerpc/eeh: Show hex prefix for PE state sysfs
    
    As Michael suggested, the hex prefix for the output of EEH PE
    state sysfs entry (/sys/bus/pci/devices/xxx/eeh_pe_state) is
    always informative to users.
    
    Suggested-by: Michael Ellerman <[email protected]>
    Signed-off-by: Gavin Shan <[email protected]>
    Signed-off-by: Michael Ellerman <[email protected]>

commit 24c20f10583647e30afe87b6f6d5e14bc7b1cbc6
Author: Christoph Hellwig <[email protected]>
Date:   Tue Sep 30 16:43:46 2014 +0200

    scsi: add a CONFIG_SCSI_MQ_DEFAULT option
    
    Add a Kconfig option to enable the blk-mq path for SCSI by default
    to ease testing and deployment in setups that know they benefit
    from blk-mq.
    
    Signed-off-by: Christoph Hellwig <[email protected]>
    Reviewed-by: Martin K. Petersen <[email protected]>
    Reviewed-by: Robert Elliott <[email protected]>
    Tested-by: Robert Elliott <[email protected]>

commit e785060ea3a1c8e37a8bc1449c79e36bff2b5b13
Author: Dolev Raviv <[email protected]>
Date:   Thu Sep 25 15:32:36 2014 +0300

    ufs: definitions for phy interface
    
    - Adding some of the definitions missing in unipro.h, including power
      enumeration.
    - Read Modify Write Line helper function
    - Indication for the type of suspend
    
    Signed-off-by: Dolev Raviv <[email protected]>
    Signed-off-by: Subhash Jadavani <[email protected]>
    Signed-off-by: Yaniv Gardi <[email protected]>
    Signed-off-by: Christoph Hellwig <[email protected]>

commit 374a246e4ebda1fc55d537877bf2412e511ecc7b
Author: Subhash Jadavani <[email protected]>
Date:   Thu Sep 25 15:32:35 2014 +0300

    ufs: tune bkops while power managment events
    
    Add capability to control the auto bkops during suspend.
    If host explicitly enables the auto bkops (background operation) on device
    then only device would perform the bkops on its own. If auto bkops is not
    enabled explicitly and if the device reaches to state where it must do
    background operation, device would raise the urgent bkops exception event
    to host and then host will enable the auto bkops on device. This patch
    adds the option to choose whether auto bkops should be enabled during
    runtime suspend or not. Since we don't want to keep the device active to
    perform the non critical bkops, host will enable urgent bkops only.
    
    Keep auto-bkops enabled after resume if urgent bkops needed.
    If device bkops status shows that its in critical need of executing
    background operations, host should allow the device to continue doing
    background operations.
    
    Signed-off-by: Subhash Jadavani <[email protected]>
    Signed-off-by: Dolev Raviv <[email protected]>
    Signed-off-by: Christoph Hellwig <[email protected]>

commit 856b348305c98d4e0c8e5eafa97c61443197f8d3
Author: Sahitya Tummala <[email protected]>
Date:   Thu Sep 25 15:32:34 2014 +0300

    ufs: Add support for clock scaling using devfreq framework
    
    The clocks for UFS device will be managed by generic DVFS (Dynamic
    Voltage and Frequency Scaling) framework within kernel. This devfreq
    framework works with different governors to scale the clocks. By default,
    UFS devices uses simple_ondemand governor which scales the clocks up if
    the load is more than upthreshold and scales down if the load is less than
    downthreshold.
    
    Signed-off-by: Sahitya Tummala <[email protected]>
    Signed-off-by: Dolev Raviv <[email protected]>
    Signed-off-by: Christoph Hellwig <[email protected]>

commit 4cff6d991e4a291cf50fe2659da2ea9ad46620bf
Author: Sahitya Tummala <[email protected]>
Date:   Thu Sep 25 15:32:33 2014 +0300

    ufs: Add freq-table-hz property for UFS device
    
    Add freq-table-hz propery for UFS device to keep track of
    <min max> frequencies supported by UFS clocks.
    
    Signed-off-by: Sahitya Tummala <[email protected]>
    Signed-off-by: Dolev Raviv <[email protected]>
    Signed-off-by: Christoph Hellwig <[email protected]>

commit 1ab27c9cf8b63dd8dec9e17b5c17721c7f3b6cc7
Author: Sahitya Tummala <[email protected]>
Date:   Thu Sep 25 15:32:32 2014 +0300

    ufs: Add support for clock gating
    
    The UFS controller clocks can be gated after certain period of
    inactivity, which is typically less than runtime suspend timeout.
    In addition to clocks the link will also be put into Hibern8 mode
    to save more power.
    
    The clock gating can be turned on by enabling the capability
    UFSHCD_CAP_CLK_GATING. To enable entering into Hibern8 mode as part of
    clock gating, set the capability UFSHCD_CAP_HIBERN8_WITH_CLK_GATING.
    
    The tracing events for clock gating can be enabled through debugfs as:
    echo 1 > /sys/kernel/debug/tracing/events/ufs/ufshcd_clk_gating/enable
    cat /sys/kernel/debug/tracing/trace_pipe
    
    Signed-off-by: Sahitya Tummala <[email protected]>
    Signed-off-by: Dolev Raviv <[email protected]>
    Signed-off-by: Christoph Hellwig <[email protected]>

commit 7eb584db73bebbc9852a14341431ed6935419bec
Author: Dolev Raviv <[email protected]>
Date:   Thu Sep 25 15:32:31 2014 +0300

    ufs: refactor configuring power mode
    
    Sometimes, the device shall report its maximum power and speed
    capabilities, but we might not wish to configure it to use those
    maximum capabilities.
    This change adds support for the vendor specific host driver to
    implement power change notify callback.
    
    To enable configuring different power modes (number of lanes,
    gear number and fast/slow modes) it is necessary to split the
    configuration stage from the stage that reads the device max power mode.
    In addition, it is not required to read the configuration more than
    once, thus the configuration is stored after reading it once.
    
    Signed-off-by: Dolev Raviv <[email protected]>
    Signed-off-by: Yaniv Gardi <[email protected]>
    Signed-off-by: Christoph Hellwig <[email protected]>

commit 57d104c153d3d6d7bea60089e80f37501851ed2c
Author: Subhash Jadavani <[email protected]>
Date:   Thu Sep 25 15:32:30 2014 +0300

    ufs: add UFS power management support
    
    This patch adds support for UFS device and UniPro link power management
    during runtime/system PM.
    
    Main idea is to define multiple UFS low power levels based on UFS device
    and UFS link power states. This would allow any specific platform or pci
    driver to choose the best suited low power level during runtime and
    system suspend based on their power goals.
    
    bkops handlig:
    To put the UFS device in sleep state when bkops is disabled, first query
    the bkops status from the device and enable bkops on device only if
    device needs time to perform the bkops.
    
    START_STOP handling:
    Before sending START_STOP_UNIT to the device well-known logical unit
    (w-lun) to make sure that the device w-lun unit attention condition is
    cleared.
    
    Write protection:
    UFS device specification allows LUs to be write protected, either
    permanently or power on write protected. If any LU is power on write
    protected and if the card is power cycled (by powering off VCCQ and/or
    VCC rails), LU's write protect status would be lost. So this means those
    LUs can be written now. To ensures that UFS device is power cycled only
    if the power on protect is not set for any of the LUs, check if power on
    write protect is set and if device is in sleep/power-off state & link in
    inactive state (Hibern8 or OFF state).
    If none of the Logical Units on UFS device is power on write protected
    then all UFS device power rails (VCC, VCCQ & VCCQ2) can be turned off if
    UFS device is in power-off state and UFS link is in OFF state. But current
    implementation would disable all device power rails even if UFS link is
    not in OFF state.
    
    Low power mode:
    If UFS link is in OFF state then UFS host controller can be power collapsed
    to avoid leakage current from it. Note that if UFS host controller is power
    collapsed, full UFS reinitialization will be required on resume to
    re-establish the link between host and device.
    
    Signed-off-by: Subhash Jadavani <[email protected]>
    Signed-off-by: Dolev Raviv <[email protected]>
    Signed-off-by: Sujit Reddy Thumma <[email protected]>
    Signed-off-by: Christoph Hellwig <[email protected]>

commit 0ce147d48a3e3352859f0c185e98e8392bee7a25
Author: Subhash Jadavani <[email protected]>
Date:   Thu Sep 25 15:32:29 2014 +0300

    ufs: introduce well known logical unit in ufs
    
    UFS device may have standard LUs and LUN id could be from 0x00 to 0x7F.
    UFS device specification use "Peripheral Device Addressing Format"
    (SCSI SAM-5) for standard LUs.
    
    UFS device may also have the Well Known LUs (also referred as W-LU) which
    again could be from 0x00 to 0x7F. For W-LUs, UFS device specification only
    allows the "Extended Addressing Format" (SCSI SAM-5) which means the W-LUNs
    would start from 0xC100 onwards.
    
    This means max. LUN number reported from UFS device could be 0xC17F hence
    this patch advertise the "max_lun" as 0xC17F which will allow SCSI mid
    layer to detect the W-LUs as well.
    
    But once the W-LUs are detected, UFSHCD driver may get the commands with
    SCSI LUN id upto 0xC17F but UPIU LUN id field is only 8-bit wide so it
    requires the mapping of SCSI LUN id to UPIU LUN id. This patch also add
    support for this mapping.
    
    Signed-off-by: Subhash Jadavani <[email protected]>
    Signed-off-by: Dolev Raviv <[email protected]>
    Signed-off-by: Sujit Reddy Thumma <[email protected]>
    Signed-off-by: Christoph Hellwig <[email protected]>

commit 2a8fa600445c45222632810a4811ce820279d106
Author: Subhash Jadavani <su…
bengal pushed a commit to bengal/linux that referenced this pull request Oct 7, 2014
ERROR: code indent should use tabs where possible
torvalds#37: FILE: include/linux/mmdebug.h:33:
+        do {^I^I^I^I^I^I^I^I\$

WARNING: please, no spaces at the start of a line
torvalds#37: FILE: include/linux/mmdebug.h:33:
+        do {^I^I^I^I^I^I^I^I\$

ERROR: code indent should use tabs where possible
torvalds#38: FILE: include/linux/mmdebug.h:34:
+                if (unlikely(cond)) {^I^I^I^I^I\$

WARNING: please, no spaces at the start of a line
torvalds#38: FILE: include/linux/mmdebug.h:34:
+                if (unlikely(cond)) {^I^I^I^I^I\$

ERROR: code indent should use tabs where possible
torvalds#39: FILE: include/linux/mmdebug.h:35:
+                        dump_mm(mm);^I^I^I^I^I\$

WARNING: please, no spaces at the start of a line
torvalds#39: FILE: include/linux/mmdebug.h:35:
+                        dump_mm(mm);^I^I^I^I^I\$

ERROR: code indent should use tabs where possible
torvalds#40: FILE: include/linux/mmdebug.h:36:
+                        BUG();^I^I^I^I^I^I\$

WARNING: please, no spaces at the start of a line
torvalds#40: FILE: include/linux/mmdebug.h:36:
+                        BUG();^I^I^I^I^I^I\$

ERROR: code indent should use tabs where possible
torvalds#41: FILE: include/linux/mmdebug.h:37:
+                }^I^I^I^I^I^I^I\$

WARNING: please, no spaces at the start of a line
torvalds#41: FILE: include/linux/mmdebug.h:37:
+                }^I^I^I^I^I^I^I\$

ERROR: code indent should use tabs where possible
torvalds#42: FILE: include/linux/mmdebug.h:38:
+        } while (0)$

WARNING: please, no spaces at the start of a line
torvalds#42: FILE: include/linux/mmdebug.h:38:
+        } while (0)$

WARNING: Prefer [subsystem eg: netdev]_alert([subsystem]dev, ... then dev_alert(dev, ... then pr_alert(...  to printk(KERN_ALERT ...
torvalds#74: FILE: mm/debug.c:171:
+	printk(KERN_ALERT

total: 6 errors, 7 warnings, 109 lines checked

NOTE: whitespace errors detected, you may wish to use scripts/cleanpatch or
      scripts/cleanfile

./patches/mm-introduce-vm_bug_on_mm.patch has style problems, please review.

If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Sasha Levin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
luigi311 pushed a commit to luigi311/linux that referenced this pull request Mar 31, 2024
The mlxbf_gige driver encounters a NULL pointer exception in
mlxbf_gige_open() when kdump is enabled.  The sequence to reproduce
the exception is as follows:
a) enable kdump
b) trigger kdump via "echo c > /proc/sysrq-trigger"
c) kdump kernel executes
d) kdump kernel loads mlxbf_gige module
e) the mlxbf_gige module runs its open() as the
   the "oob_net0" interface is brought up
f) mlxbf_gige module will experience an exception
   during its open(), something like:

     Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
     Mem abort info:
       ESR = 0x0000000086000004
       EC = 0x21: IABT (current EL), IL = 32 bits
       SET = 0, FnV = 0
       EA = 0, S1PTW = 0
       FSC = 0x04: level 0 translation fault
     user pgtable: 4k pages, 48-bit VAs, pgdp=00000000e29a4000
     [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
     Internal error: Oops: 0000000086000004 [#1] SMP
     CPU: 0 PID: 812 Comm: NetworkManager Tainted: G           OE     5.15.0-1035-bluefield torvalds#37-Ubuntu
     Hardware name: https://www.mellanox.com BlueField-3 SmartNIC Main Card/BlueField-3 SmartNIC Main Card, BIOS 4.6.0.13024 Jan 19 2024
     pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
     pc : 0x0
     lr : __napi_poll+0x40/0x230
     sp : ffff800008003e00
     x29: ffff800008003e00 x28: 0000000000000000 x27: 00000000ffffffff
     x26: ffff000066027238 x25: ffff00007cedec00 x24: ffff800008003ec8
     x23: 000000000000012c x22: ffff800008003eb7 x21: 0000000000000000
     x20: 0000000000000001 x19: ffff000066027238 x18: 0000000000000000
     x17: ffff578fcb450000 x16: ffffa870b083c7c0 x15: 0000aaab010441d0
     x14: 0000000000000001 x13: 00726f7272655f65 x12: 6769675f6662786c
     x11: 0000000000000000 x10: 0000000000000000 x9 : ffffa870b0842398
     x8 : 0000000000000004 x7 : fe5a48b9069706ea x6 : 17fdb11fc84ae0d2
     x5 : d94a82549d594f35 x4 : 0000000000000000 x3 : 0000000000400100
     x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000066027238
     Call trace:
      0x0
      net_rx_action+0x178/0x360
      __do_softirq+0x15c/0x428
      __irq_exit_rcu+0xac/0xec
      irq_exit+0x18/0x2c
      handle_domain_irq+0x6c/0xa0
      gic_handle_irq+0xec/0x1b0
      call_on_irq_stack+0x20/0x2c
      do_interrupt_handler+0x5c/0x70
      el1_interrupt+0x30/0x50
      el1h_64_irq_handler+0x18/0x2c
      el1h_64_irq+0x7c/0x80
      __setup_irq+0x4c0/0x950
      request_threaded_irq+0xf4/0x1bc
      mlxbf_gige_request_irqs+0x68/0x110 [mlxbf_gige]
      mlxbf_gige_open+0x5c/0x170 [mlxbf_gige]
      __dev_open+0x100/0x220
      __dev_change_flags+0x16c/0x1f0
      dev_change_flags+0x2c/0x70
      do_setlink+0x220/0xa40
      __rtnl_newlink+0x56c/0x8a0
      rtnl_newlink+0x58/0x84
      rtnetlink_rcv_msg+0x138/0x3c4
      netlink_rcv_skb+0x64/0x130
      rtnetlink_rcv+0x20/0x30
      netlink_unicast+0x2ec/0x360
      netlink_sendmsg+0x278/0x490
      __sock_sendmsg+0x5c/0x6c
      ____sys_sendmsg+0x290/0x2d4
      ___sys_sendmsg+0x84/0xd0
      __sys_sendmsg+0x70/0xd0
      __arm64_sys_sendmsg+0x2c/0x40
      invoke_syscall+0x78/0x100
      el0_svc_common.constprop.0+0x54/0x184
      do_el0_svc+0x30/0xac
      el0_svc+0x48/0x160
      el0t_64_sync_handler+0xa4/0x12c
      el0t_64_sync+0x1a4/0x1a8
     Code: bad PC value
     ---[ end trace 7d1c3f3bf9d81885 ]---
     Kernel panic - not syncing: Oops: Fatal exception in interrupt
     Kernel Offset: 0x2870a7a00000 from 0xffff800008000000
     PHYS_OFFSET: 0x80000000
     CPU features: 0x0,000005c1,a3332a5a
     Memory Limit: none
     ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---

The exception happens because there is a pending RX interrupt before the
call to request_irq(RX IRQ) executes.  Then, the RX IRQ handler fires
immediately after this request_irq() completes. The RX IRQ handler runs
"napi_schedule()" before NAPI is fully initialized via "netif_napi_add()"
and "napi_enable()", both which happen later in the open() logic.

The logic in mlxbf_gige_open() must fully initialize NAPI before any calls
to request_irq() execute.

Fixes: f92e186 ("Add Mellanox BlueField Gigabit Ethernet driver")
Signed-off-by: David Thompson <[email protected]>
Reviewed-by: Asmaa Mnebhi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
luigi311 pushed a commit to luigi311/linux that referenced this pull request Mar 31, 2024
The mlxbf_gige driver encounters a NULL pointer exception in
mlxbf_gige_open() when kdump is enabled.  The sequence to reproduce
the exception is as follows:
a) enable kdump
b) trigger kdump via "echo c > /proc/sysrq-trigger"
c) kdump kernel executes
d) kdump kernel loads mlxbf_gige module
e) the mlxbf_gige module runs its open() as the
   the "oob_net0" interface is brought up
f) mlxbf_gige module will experience an exception
   during its open(), something like:

     Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
     Mem abort info:
       ESR = 0x0000000086000004
       EC = 0x21: IABT (current EL), IL = 32 bits
       SET = 0, FnV = 0
       EA = 0, S1PTW = 0
       FSC = 0x04: level 0 translation fault
     user pgtable: 4k pages, 48-bit VAs, pgdp=00000000e29a4000
     [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
     Internal error: Oops: 0000000086000004 [#1] SMP
     CPU: 0 PID: 812 Comm: NetworkManager Tainted: G           OE     5.15.0-1035-bluefield torvalds#37-Ubuntu
     Hardware name: https://www.mellanox.com BlueField-3 SmartNIC Main Card/BlueField-3 SmartNIC Main Card, BIOS 4.6.0.13024 Jan 19 2024
     pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
     pc : 0x0
     lr : __napi_poll+0x40/0x230
     sp : ffff800008003e00
     x29: ffff800008003e00 x28: 0000000000000000 x27: 00000000ffffffff
     x26: ffff000066027238 x25: ffff00007cedec00 x24: ffff800008003ec8
     x23: 000000000000012c x22: ffff800008003eb7 x21: 0000000000000000
     x20: 0000000000000001 x19: ffff000066027238 x18: 0000000000000000
     x17: ffff578fcb450000 x16: ffffa870b083c7c0 x15: 0000aaab010441d0
     x14: 0000000000000001 x13: 00726f7272655f65 x12: 6769675f6662786c
     x11: 0000000000000000 x10: 0000000000000000 x9 : ffffa870b0842398
     x8 : 0000000000000004 x7 : fe5a48b9069706ea x6 : 17fdb11fc84ae0d2
     x5 : d94a82549d594f35 x4 : 0000000000000000 x3 : 0000000000400100
     x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000066027238
     Call trace:
      0x0
      net_rx_action+0x178/0x360
      __do_softirq+0x15c/0x428
      __irq_exit_rcu+0xac/0xec
      irq_exit+0x18/0x2c
      handle_domain_irq+0x6c/0xa0
      gic_handle_irq+0xec/0x1b0
      call_on_irq_stack+0x20/0x2c
      do_interrupt_handler+0x5c/0x70
      el1_interrupt+0x30/0x50
      el1h_64_irq_handler+0x18/0x2c
      el1h_64_irq+0x7c/0x80
      __setup_irq+0x4c0/0x950
      request_threaded_irq+0xf4/0x1bc
      mlxbf_gige_request_irqs+0x68/0x110 [mlxbf_gige]
      mlxbf_gige_open+0x5c/0x170 [mlxbf_gige]
      __dev_open+0x100/0x220
      __dev_change_flags+0x16c/0x1f0
      dev_change_flags+0x2c/0x70
      do_setlink+0x220/0xa40
      __rtnl_newlink+0x56c/0x8a0
      rtnl_newlink+0x58/0x84
      rtnetlink_rcv_msg+0x138/0x3c4
      netlink_rcv_skb+0x64/0x130
      rtnetlink_rcv+0x20/0x30
      netlink_unicast+0x2ec/0x360
      netlink_sendmsg+0x278/0x490
      __sock_sendmsg+0x5c/0x6c
      ____sys_sendmsg+0x290/0x2d4
      ___sys_sendmsg+0x84/0xd0
      __sys_sendmsg+0x70/0xd0
      __arm64_sys_sendmsg+0x2c/0x40
      invoke_syscall+0x78/0x100
      el0_svc_common.constprop.0+0x54/0x184
      do_el0_svc+0x30/0xac
      el0_svc+0x48/0x160
      el0t_64_sync_handler+0xa4/0x12c
      el0t_64_sync+0x1a4/0x1a8
     Code: bad PC value
     ---[ end trace 7d1c3f3bf9d81885 ]---
     Kernel panic - not syncing: Oops: Fatal exception in interrupt
     Kernel Offset: 0x2870a7a00000 from 0xffff800008000000
     PHYS_OFFSET: 0x80000000
     CPU features: 0x0,000005c1,a3332a5a
     Memory Limit: none
     ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---

The exception happens because there is a pending RX interrupt before the
call to request_irq(RX IRQ) executes.  Then, the RX IRQ handler fires
immediately after this request_irq() completes. The RX IRQ handler runs
"napi_schedule()" before NAPI is fully initialized via "netif_napi_add()"
and "napi_enable()", both which happen later in the open() logic.

The logic in mlxbf_gige_open() must fully initialize NAPI before any calls
to request_irq() execute.

Fixes: f92e186 ("Add Mellanox BlueField Gigabit Ethernet driver")
Signed-off-by: David Thompson <[email protected]>
Reviewed-by: Asmaa Mnebhi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Kaz205 pushed a commit to Kaz205/linux that referenced this pull request Apr 8, 2024
[ Upstream commit f7442a6 ]

The mlxbf_gige driver encounters a NULL pointer exception in
mlxbf_gige_open() when kdump is enabled.  The sequence to reproduce
the exception is as follows:
a) enable kdump
b) trigger kdump via "echo c > /proc/sysrq-trigger"
c) kdump kernel executes
d) kdump kernel loads mlxbf_gige module
e) the mlxbf_gige module runs its open() as the
   the "oob_net0" interface is brought up
f) mlxbf_gige module will experience an exception
   during its open(), something like:

     Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
     Mem abort info:
       ESR = 0x0000000086000004
       EC = 0x21: IABT (current EL), IL = 32 bits
       SET = 0, FnV = 0
       EA = 0, S1PTW = 0
       FSC = 0x04: level 0 translation fault
     user pgtable: 4k pages, 48-bit VAs, pgdp=00000000e29a4000
     [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
     Internal error: Oops: 0000000086000004 [#1] SMP
     CPU: 0 PID: 812 Comm: NetworkManager Tainted: G           OE     5.15.0-1035-bluefield torvalds#37-Ubuntu
     Hardware name: https://www.mellanox.com BlueField-3 SmartNIC Main Card/BlueField-3 SmartNIC Main Card, BIOS 4.6.0.13024 Jan 19 2024
     pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
     pc : 0x0
     lr : __napi_poll+0x40/0x230
     sp : ffff800008003e00
     x29: ffff800008003e00 x28: 0000000000000000 x27: 00000000ffffffff
     x26: ffff000066027238 x25: ffff00007cedec00 x24: ffff800008003ec8
     x23: 000000000000012c x22: ffff800008003eb7 x21: 0000000000000000
     x20: 0000000000000001 x19: ffff000066027238 x18: 0000000000000000
     x17: ffff578fcb450000 x16: ffffa870b083c7c0 x15: 0000aaab010441d0
     x14: 0000000000000001 x13: 00726f7272655f65 x12: 6769675f6662786c
     x11: 0000000000000000 x10: 0000000000000000 x9 : ffffa870b0842398
     x8 : 0000000000000004 x7 : fe5a48b9069706ea x6 : 17fdb11fc84ae0d2
     x5 : d94a82549d594f35 x4 : 0000000000000000 x3 : 0000000000400100
     x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000066027238
     Call trace:
      0x0
      net_rx_action+0x178/0x360
      __do_softirq+0x15c/0x428
      __irq_exit_rcu+0xac/0xec
      irq_exit+0x18/0x2c
      handle_domain_irq+0x6c/0xa0
      gic_handle_irq+0xec/0x1b0
      call_on_irq_stack+0x20/0x2c
      do_interrupt_handler+0x5c/0x70
      el1_interrupt+0x30/0x50
      el1h_64_irq_handler+0x18/0x2c
      el1h_64_irq+0x7c/0x80
      __setup_irq+0x4c0/0x950
      request_threaded_irq+0xf4/0x1bc
      mlxbf_gige_request_irqs+0x68/0x110 [mlxbf_gige]
      mlxbf_gige_open+0x5c/0x170 [mlxbf_gige]
      __dev_open+0x100/0x220
      __dev_change_flags+0x16c/0x1f0
      dev_change_flags+0x2c/0x70
      do_setlink+0x220/0xa40
      __rtnl_newlink+0x56c/0x8a0
      rtnl_newlink+0x58/0x84
      rtnetlink_rcv_msg+0x138/0x3c4
      netlink_rcv_skb+0x64/0x130
      rtnetlink_rcv+0x20/0x30
      netlink_unicast+0x2ec/0x360
      netlink_sendmsg+0x278/0x490
      __sock_sendmsg+0x5c/0x6c
      ____sys_sendmsg+0x290/0x2d4
      ___sys_sendmsg+0x84/0xd0
      __sys_sendmsg+0x70/0xd0
      __arm64_sys_sendmsg+0x2c/0x40
      invoke_syscall+0x78/0x100
      el0_svc_common.constprop.0+0x54/0x184
      do_el0_svc+0x30/0xac
      el0_svc+0x48/0x160
      el0t_64_sync_handler+0xa4/0x12c
      el0t_64_sync+0x1a4/0x1a8
     Code: bad PC value
     ---[ end trace 7d1c3f3bf9d81885 ]---
     Kernel panic - not syncing: Oops: Fatal exception in interrupt
     Kernel Offset: 0x2870a7a00000 from 0xffff800008000000
     PHYS_OFFSET: 0x80000000
     CPU features: 0x0,000005c1,a3332a5a
     Memory Limit: none
     ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---

The exception happens because there is a pending RX interrupt before the
call to request_irq(RX IRQ) executes.  Then, the RX IRQ handler fires
immediately after this request_irq() completes. The RX IRQ handler runs
"napi_schedule()" before NAPI is fully initialized via "netif_napi_add()"
and "napi_enable()", both which happen later in the open() logic.

The logic in mlxbf_gige_open() must fully initialize NAPI before any calls
to request_irq() execute.

Fixes: f92e186 ("Add Mellanox BlueField Gigabit Ethernet driver")
Signed-off-by: David Thompson <[email protected]>
Reviewed-by: Asmaa Mnebhi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Apr 9, 2024
[ Upstream commit f7442a6 ]

The mlxbf_gige driver encounters a NULL pointer exception in
mlxbf_gige_open() when kdump is enabled.  The sequence to reproduce
the exception is as follows:
a) enable kdump
b) trigger kdump via "echo c > /proc/sysrq-trigger"
c) kdump kernel executes
d) kdump kernel loads mlxbf_gige module
e) the mlxbf_gige module runs its open() as the
   the "oob_net0" interface is brought up
f) mlxbf_gige module will experience an exception
   during its open(), something like:

     Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
     Mem abort info:
       ESR = 0x0000000086000004
       EC = 0x21: IABT (current EL), IL = 32 bits
       SET = 0, FnV = 0
       EA = 0, S1PTW = 0
       FSC = 0x04: level 0 translation fault
     user pgtable: 4k pages, 48-bit VAs, pgdp=00000000e29a4000
     [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
     Internal error: Oops: 0000000086000004 [#1] SMP
     CPU: 0 PID: 812 Comm: NetworkManager Tainted: G           OE     5.15.0-1035-bluefield torvalds#37-Ubuntu
     Hardware name: https://www.mellanox.com BlueField-3 SmartNIC Main Card/BlueField-3 SmartNIC Main Card, BIOS 4.6.0.13024 Jan 19 2024
     pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
     pc : 0x0
     lr : __napi_poll+0x40/0x230
     sp : ffff800008003e00
     x29: ffff800008003e00 x28: 0000000000000000 x27: 00000000ffffffff
     x26: ffff000066027238 x25: ffff00007cedec00 x24: ffff800008003ec8
     x23: 000000000000012c x22: ffff800008003eb7 x21: 0000000000000000
     x20: 0000000000000001 x19: ffff000066027238 x18: 0000000000000000
     x17: ffff578fcb450000 x16: ffffa870b083c7c0 x15: 0000aaab010441d0
     x14: 0000000000000001 x13: 00726f7272655f65 x12: 6769675f6662786c
     x11: 0000000000000000 x10: 0000000000000000 x9 : ffffa870b0842398
     x8 : 0000000000000004 x7 : fe5a48b9069706ea x6 : 17fdb11fc84ae0d2
     x5 : d94a82549d594f35 x4 : 0000000000000000 x3 : 0000000000400100
     x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000066027238
     Call trace:
      0x0
      net_rx_action+0x178/0x360
      __do_softirq+0x15c/0x428
      __irq_exit_rcu+0xac/0xec
      irq_exit+0x18/0x2c
      handle_domain_irq+0x6c/0xa0
      gic_handle_irq+0xec/0x1b0
      call_on_irq_stack+0x20/0x2c
      do_interrupt_handler+0x5c/0x70
      el1_interrupt+0x30/0x50
      el1h_64_irq_handler+0x18/0x2c
      el1h_64_irq+0x7c/0x80
      __setup_irq+0x4c0/0x950
      request_threaded_irq+0xf4/0x1bc
      mlxbf_gige_request_irqs+0x68/0x110 [mlxbf_gige]
      mlxbf_gige_open+0x5c/0x170 [mlxbf_gige]
      __dev_open+0x100/0x220
      __dev_change_flags+0x16c/0x1f0
      dev_change_flags+0x2c/0x70
      do_setlink+0x220/0xa40
      __rtnl_newlink+0x56c/0x8a0
      rtnl_newlink+0x58/0x84
      rtnetlink_rcv_msg+0x138/0x3c4
      netlink_rcv_skb+0x64/0x130
      rtnetlink_rcv+0x20/0x30
      netlink_unicast+0x2ec/0x360
      netlink_sendmsg+0x278/0x490
      __sock_sendmsg+0x5c/0x6c
      ____sys_sendmsg+0x290/0x2d4
      ___sys_sendmsg+0x84/0xd0
      __sys_sendmsg+0x70/0xd0
      __arm64_sys_sendmsg+0x2c/0x40
      invoke_syscall+0x78/0x100
      el0_svc_common.constprop.0+0x54/0x184
      do_el0_svc+0x30/0xac
      el0_svc+0x48/0x160
      el0t_64_sync_handler+0xa4/0x12c
      el0t_64_sync+0x1a4/0x1a8
     Code: bad PC value
     ---[ end trace 7d1c3f3bf9d81885 ]---
     Kernel panic - not syncing: Oops: Fatal exception in interrupt
     Kernel Offset: 0x2870a7a00000 from 0xffff800008000000
     PHYS_OFFSET: 0x80000000
     CPU features: 0x0,000005c1,a3332a5a
     Memory Limit: none
     ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---

The exception happens because there is a pending RX interrupt before the
call to request_irq(RX IRQ) executes.  Then, the RX IRQ handler fires
immediately after this request_irq() completes. The RX IRQ handler runs
"napi_schedule()" before NAPI is fully initialized via "netif_napi_add()"
and "napi_enable()", both which happen later in the open() logic.

The logic in mlxbf_gige_open() must fully initialize NAPI before any calls
to request_irq() execute.

Fixes: f92e186 ("Add Mellanox BlueField Gigabit Ethernet driver")
Signed-off-by: David Thompson <[email protected]>
Reviewed-by: Asmaa Mnebhi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Apr 9, 2024
[ Upstream commit f7442a6 ]

The mlxbf_gige driver encounters a NULL pointer exception in
mlxbf_gige_open() when kdump is enabled.  The sequence to reproduce
the exception is as follows:
a) enable kdump
b) trigger kdump via "echo c > /proc/sysrq-trigger"
c) kdump kernel executes
d) kdump kernel loads mlxbf_gige module
e) the mlxbf_gige module runs its open() as the
   the "oob_net0" interface is brought up
f) mlxbf_gige module will experience an exception
   during its open(), something like:

     Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
     Mem abort info:
       ESR = 0x0000000086000004
       EC = 0x21: IABT (current EL), IL = 32 bits
       SET = 0, FnV = 0
       EA = 0, S1PTW = 0
       FSC = 0x04: level 0 translation fault
     user pgtable: 4k pages, 48-bit VAs, pgdp=00000000e29a4000
     [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
     Internal error: Oops: 0000000086000004 [#1] SMP
     CPU: 0 PID: 812 Comm: NetworkManager Tainted: G           OE     5.15.0-1035-bluefield torvalds#37-Ubuntu
     Hardware name: https://www.mellanox.com BlueField-3 SmartNIC Main Card/BlueField-3 SmartNIC Main Card, BIOS 4.6.0.13024 Jan 19 2024
     pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
     pc : 0x0
     lr : __napi_poll+0x40/0x230
     sp : ffff800008003e00
     x29: ffff800008003e00 x28: 0000000000000000 x27: 00000000ffffffff
     x26: ffff000066027238 x25: ffff00007cedec00 x24: ffff800008003ec8
     x23: 000000000000012c x22: ffff800008003eb7 x21: 0000000000000000
     x20: 0000000000000001 x19: ffff000066027238 x18: 0000000000000000
     x17: ffff578fcb450000 x16: ffffa870b083c7c0 x15: 0000aaab010441d0
     x14: 0000000000000001 x13: 00726f7272655f65 x12: 6769675f6662786c
     x11: 0000000000000000 x10: 0000000000000000 x9 : ffffa870b0842398
     x8 : 0000000000000004 x7 : fe5a48b9069706ea x6 : 17fdb11fc84ae0d2
     x5 : d94a82549d594f35 x4 : 0000000000000000 x3 : 0000000000400100
     x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000066027238
     Call trace:
      0x0
      net_rx_action+0x178/0x360
      __do_softirq+0x15c/0x428
      __irq_exit_rcu+0xac/0xec
      irq_exit+0x18/0x2c
      handle_domain_irq+0x6c/0xa0
      gic_handle_irq+0xec/0x1b0
      call_on_irq_stack+0x20/0x2c
      do_interrupt_handler+0x5c/0x70
      el1_interrupt+0x30/0x50
      el1h_64_irq_handler+0x18/0x2c
      el1h_64_irq+0x7c/0x80
      __setup_irq+0x4c0/0x950
      request_threaded_irq+0xf4/0x1bc
      mlxbf_gige_request_irqs+0x68/0x110 [mlxbf_gige]
      mlxbf_gige_open+0x5c/0x170 [mlxbf_gige]
      __dev_open+0x100/0x220
      __dev_change_flags+0x16c/0x1f0
      dev_change_flags+0x2c/0x70
      do_setlink+0x220/0xa40
      __rtnl_newlink+0x56c/0x8a0
      rtnl_newlink+0x58/0x84
      rtnetlink_rcv_msg+0x138/0x3c4
      netlink_rcv_skb+0x64/0x130
      rtnetlink_rcv+0x20/0x30
      netlink_unicast+0x2ec/0x360
      netlink_sendmsg+0x278/0x490
      __sock_sendmsg+0x5c/0x6c
      ____sys_sendmsg+0x290/0x2d4
      ___sys_sendmsg+0x84/0xd0
      __sys_sendmsg+0x70/0xd0
      __arm64_sys_sendmsg+0x2c/0x40
      invoke_syscall+0x78/0x100
      el0_svc_common.constprop.0+0x54/0x184
      do_el0_svc+0x30/0xac
      el0_svc+0x48/0x160
      el0t_64_sync_handler+0xa4/0x12c
      el0t_64_sync+0x1a4/0x1a8
     Code: bad PC value
     ---[ end trace 7d1c3f3bf9d81885 ]---
     Kernel panic - not syncing: Oops: Fatal exception in interrupt
     Kernel Offset: 0x2870a7a00000 from 0xffff800008000000
     PHYS_OFFSET: 0x80000000
     CPU features: 0x0,000005c1,a3332a5a
     Memory Limit: none
     ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---

The exception happens because there is a pending RX interrupt before the
call to request_irq(RX IRQ) executes.  Then, the RX IRQ handler fires
immediately after this request_irq() completes. The RX IRQ handler runs
"napi_schedule()" before NAPI is fully initialized via "netif_napi_add()"
and "napi_enable()", both which happen later in the open() logic.

The logic in mlxbf_gige_open() must fully initialize NAPI before any calls
to request_irq() execute.

Fixes: f92e186 ("Add Mellanox BlueField Gigabit Ethernet driver")
Signed-off-by: David Thompson <[email protected]>
Reviewed-by: Asmaa Mnebhi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Kaz205 pushed a commit to Kaz205/linux that referenced this pull request Apr 9, 2024
[ Upstream commit f7442a6 ]

The mlxbf_gige driver encounters a NULL pointer exception in
mlxbf_gige_open() when kdump is enabled.  The sequence to reproduce
the exception is as follows:
a) enable kdump
b) trigger kdump via "echo c > /proc/sysrq-trigger"
c) kdump kernel executes
d) kdump kernel loads mlxbf_gige module
e) the mlxbf_gige module runs its open() as the
   the "oob_net0" interface is brought up
f) mlxbf_gige module will experience an exception
   during its open(), something like:

     Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
     Mem abort info:
       ESR = 0x0000000086000004
       EC = 0x21: IABT (current EL), IL = 32 bits
       SET = 0, FnV = 0
       EA = 0, S1PTW = 0
       FSC = 0x04: level 0 translation fault
     user pgtable: 4k pages, 48-bit VAs, pgdp=00000000e29a4000
     [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
     Internal error: Oops: 0000000086000004 [#1] SMP
     CPU: 0 PID: 812 Comm: NetworkManager Tainted: G           OE     5.15.0-1035-bluefield torvalds#37-Ubuntu
     Hardware name: https://www.mellanox.com BlueField-3 SmartNIC Main Card/BlueField-3 SmartNIC Main Card, BIOS 4.6.0.13024 Jan 19 2024
     pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
     pc : 0x0
     lr : __napi_poll+0x40/0x230
     sp : ffff800008003e00
     x29: ffff800008003e00 x28: 0000000000000000 x27: 00000000ffffffff
     x26: ffff000066027238 x25: ffff00007cedec00 x24: ffff800008003ec8
     x23: 000000000000012c x22: ffff800008003eb7 x21: 0000000000000000
     x20: 0000000000000001 x19: ffff000066027238 x18: 0000000000000000
     x17: ffff578fcb450000 x16: ffffa870b083c7c0 x15: 0000aaab010441d0
     x14: 0000000000000001 x13: 00726f7272655f65 x12: 6769675f6662786c
     x11: 0000000000000000 x10: 0000000000000000 x9 : ffffa870b0842398
     x8 : 0000000000000004 x7 : fe5a48b9069706ea x6 : 17fdb11fc84ae0d2
     x5 : d94a82549d594f35 x4 : 0000000000000000 x3 : 0000000000400100
     x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000066027238
     Call trace:
      0x0
      net_rx_action+0x178/0x360
      __do_softirq+0x15c/0x428
      __irq_exit_rcu+0xac/0xec
      irq_exit+0x18/0x2c
      handle_domain_irq+0x6c/0xa0
      gic_handle_irq+0xec/0x1b0
      call_on_irq_stack+0x20/0x2c
      do_interrupt_handler+0x5c/0x70
      el1_interrupt+0x30/0x50
      el1h_64_irq_handler+0x18/0x2c
      el1h_64_irq+0x7c/0x80
      __setup_irq+0x4c0/0x950
      request_threaded_irq+0xf4/0x1bc
      mlxbf_gige_request_irqs+0x68/0x110 [mlxbf_gige]
      mlxbf_gige_open+0x5c/0x170 [mlxbf_gige]
      __dev_open+0x100/0x220
      __dev_change_flags+0x16c/0x1f0
      dev_change_flags+0x2c/0x70
      do_setlink+0x220/0xa40
      __rtnl_newlink+0x56c/0x8a0
      rtnl_newlink+0x58/0x84
      rtnetlink_rcv_msg+0x138/0x3c4
      netlink_rcv_skb+0x64/0x130
      rtnetlink_rcv+0x20/0x30
      netlink_unicast+0x2ec/0x360
      netlink_sendmsg+0x278/0x490
      __sock_sendmsg+0x5c/0x6c
      ____sys_sendmsg+0x290/0x2d4
      ___sys_sendmsg+0x84/0xd0
      __sys_sendmsg+0x70/0xd0
      __arm64_sys_sendmsg+0x2c/0x40
      invoke_syscall+0x78/0x100
      el0_svc_common.constprop.0+0x54/0x184
      do_el0_svc+0x30/0xac
      el0_svc+0x48/0x160
      el0t_64_sync_handler+0xa4/0x12c
      el0t_64_sync+0x1a4/0x1a8
     Code: bad PC value
     ---[ end trace 7d1c3f3bf9d81885 ]---
     Kernel panic - not syncing: Oops: Fatal exception in interrupt
     Kernel Offset: 0x2870a7a00000 from 0xffff800008000000
     PHYS_OFFSET: 0x80000000
     CPU features: 0x0,000005c1,a3332a5a
     Memory Limit: none
     ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---

The exception happens because there is a pending RX interrupt before the
call to request_irq(RX IRQ) executes.  Then, the RX IRQ handler fires
immediately after this request_irq() completes. The RX IRQ handler runs
"napi_schedule()" before NAPI is fully initialized via "netif_napi_add()"
and "napi_enable()", both which happen later in the open() logic.

The logic in mlxbf_gige_open() must fully initialize NAPI before any calls
to request_irq() execute.

Fixes: f92e186 ("Add Mellanox BlueField Gigabit Ethernet driver")
Signed-off-by: David Thompson <[email protected]>
Reviewed-by: Asmaa Mnebhi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
heiher pushed a commit to heiher/linux that referenced this pull request Apr 10, 2024
[ Upstream commit f7442a6 ]

The mlxbf_gige driver encounters a NULL pointer exception in
mlxbf_gige_open() when kdump is enabled.  The sequence to reproduce
the exception is as follows:
a) enable kdump
b) trigger kdump via "echo c > /proc/sysrq-trigger"
c) kdump kernel executes
d) kdump kernel loads mlxbf_gige module
e) the mlxbf_gige module runs its open() as the
   the "oob_net0" interface is brought up
f) mlxbf_gige module will experience an exception
   during its open(), something like:

     Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
     Mem abort info:
       ESR = 0x0000000086000004
       EC = 0x21: IABT (current EL), IL = 32 bits
       SET = 0, FnV = 0
       EA = 0, S1PTW = 0
       FSC = 0x04: level 0 translation fault
     user pgtable: 4k pages, 48-bit VAs, pgdp=00000000e29a4000
     [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
     Internal error: Oops: 0000000086000004 [#1] SMP
     CPU: 0 PID: 812 Comm: NetworkManager Tainted: G           OE     5.15.0-1035-bluefield torvalds#37-Ubuntu
     Hardware name: https://www.mellanox.com BlueField-3 SmartNIC Main Card/BlueField-3 SmartNIC Main Card, BIOS 4.6.0.13024 Jan 19 2024
     pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
     pc : 0x0
     lr : __napi_poll+0x40/0x230
     sp : ffff800008003e00
     x29: ffff800008003e00 x28: 0000000000000000 x27: 00000000ffffffff
     x26: ffff000066027238 x25: ffff00007cedec00 x24: ffff800008003ec8
     x23: 000000000000012c x22: ffff800008003eb7 x21: 0000000000000000
     x20: 0000000000000001 x19: ffff000066027238 x18: 0000000000000000
     x17: ffff578fcb450000 x16: ffffa870b083c7c0 x15: 0000aaab010441d0
     x14: 0000000000000001 x13: 00726f7272655f65 x12: 6769675f6662786c
     x11: 0000000000000000 x10: 0000000000000000 x9 : ffffa870b0842398
     x8 : 0000000000000004 x7 : fe5a48b9069706ea x6 : 17fdb11fc84ae0d2
     x5 : d94a82549d594f35 x4 : 0000000000000000 x3 : 0000000000400100
     x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000066027238
     Call trace:
      0x0
      net_rx_action+0x178/0x360
      __do_softirq+0x15c/0x428
      __irq_exit_rcu+0xac/0xec
      irq_exit+0x18/0x2c
      handle_domain_irq+0x6c/0xa0
      gic_handle_irq+0xec/0x1b0
      call_on_irq_stack+0x20/0x2c
      do_interrupt_handler+0x5c/0x70
      el1_interrupt+0x30/0x50
      el1h_64_irq_handler+0x18/0x2c
      el1h_64_irq+0x7c/0x80
      __setup_irq+0x4c0/0x950
      request_threaded_irq+0xf4/0x1bc
      mlxbf_gige_request_irqs+0x68/0x110 [mlxbf_gige]
      mlxbf_gige_open+0x5c/0x170 [mlxbf_gige]
      __dev_open+0x100/0x220
      __dev_change_flags+0x16c/0x1f0
      dev_change_flags+0x2c/0x70
      do_setlink+0x220/0xa40
      __rtnl_newlink+0x56c/0x8a0
      rtnl_newlink+0x58/0x84
      rtnetlink_rcv_msg+0x138/0x3c4
      netlink_rcv_skb+0x64/0x130
      rtnetlink_rcv+0x20/0x30
      netlink_unicast+0x2ec/0x360
      netlink_sendmsg+0x278/0x490
      __sock_sendmsg+0x5c/0x6c
      ____sys_sendmsg+0x290/0x2d4
      ___sys_sendmsg+0x84/0xd0
      __sys_sendmsg+0x70/0xd0
      __arm64_sys_sendmsg+0x2c/0x40
      invoke_syscall+0x78/0x100
      el0_svc_common.constprop.0+0x54/0x184
      do_el0_svc+0x30/0xac
      el0_svc+0x48/0x160
      el0t_64_sync_handler+0xa4/0x12c
      el0t_64_sync+0x1a4/0x1a8
     Code: bad PC value
     ---[ end trace 7d1c3f3bf9d81885 ]---
     Kernel panic - not syncing: Oops: Fatal exception in interrupt
     Kernel Offset: 0x2870a7a00000 from 0xffff800008000000
     PHYS_OFFSET: 0x80000000
     CPU features: 0x0,000005c1,a3332a5a
     Memory Limit: none
     ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---

The exception happens because there is a pending RX interrupt before the
call to request_irq(RX IRQ) executes.  Then, the RX IRQ handler fires
immediately after this request_irq() completes. The RX IRQ handler runs
"napi_schedule()" before NAPI is fully initialized via "netif_napi_add()"
and "napi_enable()", both which happen later in the open() logic.

The logic in mlxbf_gige_open() must fully initialize NAPI before any calls
to request_irq() execute.

Fixes: f92e186 ("Add Mellanox BlueField Gigabit Ethernet driver")
Signed-off-by: David Thompson <[email protected]>
Reviewed-by: Asmaa Mnebhi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
1054009064 pushed a commit to 1054009064/linux that referenced this pull request Apr 10, 2024
[ Upstream commit f7442a6 ]

The mlxbf_gige driver encounters a NULL pointer exception in
mlxbf_gige_open() when kdump is enabled.  The sequence to reproduce
the exception is as follows:
a) enable kdump
b) trigger kdump via "echo c > /proc/sysrq-trigger"
c) kdump kernel executes
d) kdump kernel loads mlxbf_gige module
e) the mlxbf_gige module runs its open() as the
   the "oob_net0" interface is brought up
f) mlxbf_gige module will experience an exception
   during its open(), something like:

     Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
     Mem abort info:
       ESR = 0x0000000086000004
       EC = 0x21: IABT (current EL), IL = 32 bits
       SET = 0, FnV = 0
       EA = 0, S1PTW = 0
       FSC = 0x04: level 0 translation fault
     user pgtable: 4k pages, 48-bit VAs, pgdp=00000000e29a4000
     [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
     Internal error: Oops: 0000000086000004 [#1] SMP
     CPU: 0 PID: 812 Comm: NetworkManager Tainted: G           OE     5.15.0-1035-bluefield torvalds#37-Ubuntu
     Hardware name: https://www.mellanox.com BlueField-3 SmartNIC Main Card/BlueField-3 SmartNIC Main Card, BIOS 4.6.0.13024 Jan 19 2024
     pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
     pc : 0x0
     lr : __napi_poll+0x40/0x230
     sp : ffff800008003e00
     x29: ffff800008003e00 x28: 0000000000000000 x27: 00000000ffffffff
     x26: ffff000066027238 x25: ffff00007cedec00 x24: ffff800008003ec8
     x23: 000000000000012c x22: ffff800008003eb7 x21: 0000000000000000
     x20: 0000000000000001 x19: ffff000066027238 x18: 0000000000000000
     x17: ffff578fcb450000 x16: ffffa870b083c7c0 x15: 0000aaab010441d0
     x14: 0000000000000001 x13: 00726f7272655f65 x12: 6769675f6662786c
     x11: 0000000000000000 x10: 0000000000000000 x9 : ffffa870b0842398
     x8 : 0000000000000004 x7 : fe5a48b9069706ea x6 : 17fdb11fc84ae0d2
     x5 : d94a82549d594f35 x4 : 0000000000000000 x3 : 0000000000400100
     x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000066027238
     Call trace:
      0x0
      net_rx_action+0x178/0x360
      __do_softirq+0x15c/0x428
      __irq_exit_rcu+0xac/0xec
      irq_exit+0x18/0x2c
      handle_domain_irq+0x6c/0xa0
      gic_handle_irq+0xec/0x1b0
      call_on_irq_stack+0x20/0x2c
      do_interrupt_handler+0x5c/0x70
      el1_interrupt+0x30/0x50
      el1h_64_irq_handler+0x18/0x2c
      el1h_64_irq+0x7c/0x80
      __setup_irq+0x4c0/0x950
      request_threaded_irq+0xf4/0x1bc
      mlxbf_gige_request_irqs+0x68/0x110 [mlxbf_gige]
      mlxbf_gige_open+0x5c/0x170 [mlxbf_gige]
      __dev_open+0x100/0x220
      __dev_change_flags+0x16c/0x1f0
      dev_change_flags+0x2c/0x70
      do_setlink+0x220/0xa40
      __rtnl_newlink+0x56c/0x8a0
      rtnl_newlink+0x58/0x84
      rtnetlink_rcv_msg+0x138/0x3c4
      netlink_rcv_skb+0x64/0x130
      rtnetlink_rcv+0x20/0x30
      netlink_unicast+0x2ec/0x360
      netlink_sendmsg+0x278/0x490
      __sock_sendmsg+0x5c/0x6c
      ____sys_sendmsg+0x290/0x2d4
      ___sys_sendmsg+0x84/0xd0
      __sys_sendmsg+0x70/0xd0
      __arm64_sys_sendmsg+0x2c/0x40
      invoke_syscall+0x78/0x100
      el0_svc_common.constprop.0+0x54/0x184
      do_el0_svc+0x30/0xac
      el0_svc+0x48/0x160
      el0t_64_sync_handler+0xa4/0x12c
      el0t_64_sync+0x1a4/0x1a8
     Code: bad PC value
     ---[ end trace 7d1c3f3bf9d81885 ]---
     Kernel panic - not syncing: Oops: Fatal exception in interrupt
     Kernel Offset: 0x2870a7a00000 from 0xffff800008000000
     PHYS_OFFSET: 0x80000000
     CPU features: 0x0,000005c1,a3332a5a
     Memory Limit: none
     ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---

The exception happens because there is a pending RX interrupt before the
call to request_irq(RX IRQ) executes.  Then, the RX IRQ handler fires
immediately after this request_irq() completes. The RX IRQ handler runs
"napi_schedule()" before NAPI is fully initialized via "netif_napi_add()"
and "napi_enable()", both which happen later in the open() logic.

The logic in mlxbf_gige_open() must fully initialize NAPI before any calls
to request_irq() execute.

Fixes: f92e186 ("Add Mellanox BlueField Gigabit Ethernet driver")
Signed-off-by: David Thompson <[email protected]>
Reviewed-by: Asmaa Mnebhi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
1054009064 pushed a commit to 1054009064/linux that referenced this pull request Apr 10, 2024
[ Upstream commit f7442a6 ]

The mlxbf_gige driver encounters a NULL pointer exception in
mlxbf_gige_open() when kdump is enabled.  The sequence to reproduce
the exception is as follows:
a) enable kdump
b) trigger kdump via "echo c > /proc/sysrq-trigger"
c) kdump kernel executes
d) kdump kernel loads mlxbf_gige module
e) the mlxbf_gige module runs its open() as the
   the "oob_net0" interface is brought up
f) mlxbf_gige module will experience an exception
   during its open(), something like:

     Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
     Mem abort info:
       ESR = 0x0000000086000004
       EC = 0x21: IABT (current EL), IL = 32 bits
       SET = 0, FnV = 0
       EA = 0, S1PTW = 0
       FSC = 0x04: level 0 translation fault
     user pgtable: 4k pages, 48-bit VAs, pgdp=00000000e29a4000
     [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
     Internal error: Oops: 0000000086000004 [#1] SMP
     CPU: 0 PID: 812 Comm: NetworkManager Tainted: G           OE     5.15.0-1035-bluefield torvalds#37-Ubuntu
     Hardware name: https://www.mellanox.com BlueField-3 SmartNIC Main Card/BlueField-3 SmartNIC Main Card, BIOS 4.6.0.13024 Jan 19 2024
     pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
     pc : 0x0
     lr : __napi_poll+0x40/0x230
     sp : ffff800008003e00
     x29: ffff800008003e00 x28: 0000000000000000 x27: 00000000ffffffff
     x26: ffff000066027238 x25: ffff00007cedec00 x24: ffff800008003ec8
     x23: 000000000000012c x22: ffff800008003eb7 x21: 0000000000000000
     x20: 0000000000000001 x19: ffff000066027238 x18: 0000000000000000
     x17: ffff578fcb450000 x16: ffffa870b083c7c0 x15: 0000aaab010441d0
     x14: 0000000000000001 x13: 00726f7272655f65 x12: 6769675f6662786c
     x11: 0000000000000000 x10: 0000000000000000 x9 : ffffa870b0842398
     x8 : 0000000000000004 x7 : fe5a48b9069706ea x6 : 17fdb11fc84ae0d2
     x5 : d94a82549d594f35 x4 : 0000000000000000 x3 : 0000000000400100
     x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000066027238
     Call trace:
      0x0
      net_rx_action+0x178/0x360
      __do_softirq+0x15c/0x428
      __irq_exit_rcu+0xac/0xec
      irq_exit+0x18/0x2c
      handle_domain_irq+0x6c/0xa0
      gic_handle_irq+0xec/0x1b0
      call_on_irq_stack+0x20/0x2c
      do_interrupt_handler+0x5c/0x70
      el1_interrupt+0x30/0x50
      el1h_64_irq_handler+0x18/0x2c
      el1h_64_irq+0x7c/0x80
      __setup_irq+0x4c0/0x950
      request_threaded_irq+0xf4/0x1bc
      mlxbf_gige_request_irqs+0x68/0x110 [mlxbf_gige]
      mlxbf_gige_open+0x5c/0x170 [mlxbf_gige]
      __dev_open+0x100/0x220
      __dev_change_flags+0x16c/0x1f0
      dev_change_flags+0x2c/0x70
      do_setlink+0x220/0xa40
      __rtnl_newlink+0x56c/0x8a0
      rtnl_newlink+0x58/0x84
      rtnetlink_rcv_msg+0x138/0x3c4
      netlink_rcv_skb+0x64/0x130
      rtnetlink_rcv+0x20/0x30
      netlink_unicast+0x2ec/0x360
      netlink_sendmsg+0x278/0x490
      __sock_sendmsg+0x5c/0x6c
      ____sys_sendmsg+0x290/0x2d4
      ___sys_sendmsg+0x84/0xd0
      __sys_sendmsg+0x70/0xd0
      __arm64_sys_sendmsg+0x2c/0x40
      invoke_syscall+0x78/0x100
      el0_svc_common.constprop.0+0x54/0x184
      do_el0_svc+0x30/0xac
      el0_svc+0x48/0x160
      el0t_64_sync_handler+0xa4/0x12c
      el0t_64_sync+0x1a4/0x1a8
     Code: bad PC value
     ---[ end trace 7d1c3f3bf9d81885 ]---
     Kernel panic - not syncing: Oops: Fatal exception in interrupt
     Kernel Offset: 0x2870a7a00000 from 0xffff800008000000
     PHYS_OFFSET: 0x80000000
     CPU features: 0x0,000005c1,a3332a5a
     Memory Limit: none
     ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---

The exception happens because there is a pending RX interrupt before the
call to request_irq(RX IRQ) executes.  Then, the RX IRQ handler fires
immediately after this request_irq() completes. The RX IRQ handler runs
"napi_schedule()" before NAPI is fully initialized via "netif_napi_add()"
and "napi_enable()", both which happen later in the open() logic.

The logic in mlxbf_gige_open() must fully initialize NAPI before any calls
to request_irq() execute.

Fixes: f92e186 ("Add Mellanox BlueField Gigabit Ethernet driver")
Signed-off-by: David Thompson <[email protected]>
Reviewed-by: Asmaa Mnebhi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
1054009064 pushed a commit to 1054009064/linux that referenced this pull request Apr 10, 2024
[ Upstream commit f7442a6 ]

The mlxbf_gige driver encounters a NULL pointer exception in
mlxbf_gige_open() when kdump is enabled.  The sequence to reproduce
the exception is as follows:
a) enable kdump
b) trigger kdump via "echo c > /proc/sysrq-trigger"
c) kdump kernel executes
d) kdump kernel loads mlxbf_gige module
e) the mlxbf_gige module runs its open() as the
   the "oob_net0" interface is brought up
f) mlxbf_gige module will experience an exception
   during its open(), something like:

     Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
     Mem abort info:
       ESR = 0x0000000086000004
       EC = 0x21: IABT (current EL), IL = 32 bits
       SET = 0, FnV = 0
       EA = 0, S1PTW = 0
       FSC = 0x04: level 0 translation fault
     user pgtable: 4k pages, 48-bit VAs, pgdp=00000000e29a4000
     [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
     Internal error: Oops: 0000000086000004 [#1] SMP
     CPU: 0 PID: 812 Comm: NetworkManager Tainted: G           OE     5.15.0-1035-bluefield torvalds#37-Ubuntu
     Hardware name: https://www.mellanox.com BlueField-3 SmartNIC Main Card/BlueField-3 SmartNIC Main Card, BIOS 4.6.0.13024 Jan 19 2024
     pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
     pc : 0x0
     lr : __napi_poll+0x40/0x230
     sp : ffff800008003e00
     x29: ffff800008003e00 x28: 0000000000000000 x27: 00000000ffffffff
     x26: ffff000066027238 x25: ffff00007cedec00 x24: ffff800008003ec8
     x23: 000000000000012c x22: ffff800008003eb7 x21: 0000000000000000
     x20: 0000000000000001 x19: ffff000066027238 x18: 0000000000000000
     x17: ffff578fcb450000 x16: ffffa870b083c7c0 x15: 0000aaab010441d0
     x14: 0000000000000001 x13: 00726f7272655f65 x12: 6769675f6662786c
     x11: 0000000000000000 x10: 0000000000000000 x9 : ffffa870b0842398
     x8 : 0000000000000004 x7 : fe5a48b9069706ea x6 : 17fdb11fc84ae0d2
     x5 : d94a82549d594f35 x4 : 0000000000000000 x3 : 0000000000400100
     x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000066027238
     Call trace:
      0x0
      net_rx_action+0x178/0x360
      __do_softirq+0x15c/0x428
      __irq_exit_rcu+0xac/0xec
      irq_exit+0x18/0x2c
      handle_domain_irq+0x6c/0xa0
      gic_handle_irq+0xec/0x1b0
      call_on_irq_stack+0x20/0x2c
      do_interrupt_handler+0x5c/0x70
      el1_interrupt+0x30/0x50
      el1h_64_irq_handler+0x18/0x2c
      el1h_64_irq+0x7c/0x80
      __setup_irq+0x4c0/0x950
      request_threaded_irq+0xf4/0x1bc
      mlxbf_gige_request_irqs+0x68/0x110 [mlxbf_gige]
      mlxbf_gige_open+0x5c/0x170 [mlxbf_gige]
      __dev_open+0x100/0x220
      __dev_change_flags+0x16c/0x1f0
      dev_change_flags+0x2c/0x70
      do_setlink+0x220/0xa40
      __rtnl_newlink+0x56c/0x8a0
      rtnl_newlink+0x58/0x84
      rtnetlink_rcv_msg+0x138/0x3c4
      netlink_rcv_skb+0x64/0x130
      rtnetlink_rcv+0x20/0x30
      netlink_unicast+0x2ec/0x360
      netlink_sendmsg+0x278/0x490
      __sock_sendmsg+0x5c/0x6c
      ____sys_sendmsg+0x290/0x2d4
      ___sys_sendmsg+0x84/0xd0
      __sys_sendmsg+0x70/0xd0
      __arm64_sys_sendmsg+0x2c/0x40
      invoke_syscall+0x78/0x100
      el0_svc_common.constprop.0+0x54/0x184
      do_el0_svc+0x30/0xac
      el0_svc+0x48/0x160
      el0t_64_sync_handler+0xa4/0x12c
      el0t_64_sync+0x1a4/0x1a8
     Code: bad PC value
     ---[ end trace 7d1c3f3bf9d81885 ]---
     Kernel panic - not syncing: Oops: Fatal exception in interrupt
     Kernel Offset: 0x2870a7a00000 from 0xffff800008000000
     PHYS_OFFSET: 0x80000000
     CPU features: 0x0,000005c1,a3332a5a
     Memory Limit: none
     ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---

The exception happens because there is a pending RX interrupt before the
call to request_irq(RX IRQ) executes.  Then, the RX IRQ handler fires
immediately after this request_irq() completes. The RX IRQ handler runs
"napi_schedule()" before NAPI is fully initialized via "netif_napi_add()"
and "napi_enable()", both which happen later in the open() logic.

The logic in mlxbf_gige_open() must fully initialize NAPI before any calls
to request_irq() execute.

Fixes: f92e186 ("Add Mellanox BlueField Gigabit Ethernet driver")
Signed-off-by: David Thompson <[email protected]>
Reviewed-by: Asmaa Mnebhi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request May 15, 2024
__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.

This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because  __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 torvalds#37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c

Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.

Fixes: 5fde3db ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <[email protected]>
Cc: [email protected]
torvalds pushed a commit that referenced this pull request May 24, 2024
__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.

This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because  __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 #37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c

Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.

Fixes: 5fde3db ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <[email protected]>
Cc: [email protected]
Reviewed-by: Alexandre Ghiti <[email protected]>
Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.1715750938.git.namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <[email protected]>
1054009064 pushed a commit to 1054009064/linux that referenced this pull request May 28, 2024
While validating node ids in map_benchmark_ioctl(), node_possible() may
be provided with invalid argument outside of [0,MAX_NUMNODES-1] range
leading to:

BUG: KASAN: wild-memory-access in map_benchmark_ioctl (kernel/dma/map_benchmark.c:214)
Read of size 8 at addr 1fffffff8ccb6398 by task dma_map_benchma/971
CPU: 7 PID: 971 Comm: dma_map_benchma Not tainted 6.9.0-rc6 torvalds#37
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
Call Trace:
 <TASK>
dump_stack_lvl (lib/dump_stack.c:117)
kasan_report (mm/kasan/report.c:603)
kasan_check_range (mm/kasan/generic.c:189)
variable_test_bit (arch/x86/include/asm/bitops.h:227) [inline]
arch_test_bit (arch/x86/include/asm/bitops.h:239) [inline]
_test_bit at (include/asm-generic/bitops/instrumented-non-atomic.h:142) [inline]
node_state (include/linux/nodemask.h:423) [inline]
map_benchmark_ioctl (kernel/dma/map_benchmark.c:214)
full_proxy_unlocked_ioctl (fs/debugfs/file.c:333)
__x64_sys_ioctl (fs/ioctl.c:890)
do_syscall_64 (arch/x86/entry/common.c:83)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)

Compare node ids with sane bounds first. NUMA_NO_NODE is considered a
special valid case meaning that benchmarking kthreads won't be bound to a
cpuset of a given node.

Found by Linux Verification Center (linuxtesting.org).

Fixes: 65789da ("dma-mapping: add benchmark support for streaming DMA APIs")
Signed-off-by: Fedor Pchelkin <[email protected]>
Reviewed-by: Robin Murphy <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Kaz205 pushed a commit to Kaz205/linux that referenced this pull request Jun 5, 2024
[ Upstream commit 1ff05e7 ]

While validating node ids in map_benchmark_ioctl(), node_possible() may
be provided with invalid argument outside of [0,MAX_NUMNODES-1] range
leading to:

BUG: KASAN: wild-memory-access in map_benchmark_ioctl (kernel/dma/map_benchmark.c:214)
Read of size 8 at addr 1fffffff8ccb6398 by task dma_map_benchma/971
CPU: 7 PID: 971 Comm: dma_map_benchma Not tainted 6.9.0-rc6 torvalds#37
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
Call Trace:
 <TASK>
dump_stack_lvl (lib/dump_stack.c:117)
kasan_report (mm/kasan/report.c:603)
kasan_check_range (mm/kasan/generic.c:189)
variable_test_bit (arch/x86/include/asm/bitops.h:227) [inline]
arch_test_bit (arch/x86/include/asm/bitops.h:239) [inline]
_test_bit at (include/asm-generic/bitops/instrumented-non-atomic.h:142) [inline]
node_state (include/linux/nodemask.h:423) [inline]
map_benchmark_ioctl (kernel/dma/map_benchmark.c:214)
full_proxy_unlocked_ioctl (fs/debugfs/file.c:333)
__x64_sys_ioctl (fs/ioctl.c:890)
do_syscall_64 (arch/x86/entry/common.c:83)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)

Compare node ids with sane bounds first. NUMA_NO_NODE is considered a
special valid case meaning that benchmarking kthreads won't be bound to a
cpuset of a given node.

Found by Linux Verification Center (linuxtesting.org).

Fixes: 65789da ("dma-mapping: add benchmark support for streaming DMA APIs")
Signed-off-by: Fedor Pchelkin <[email protected]>
Reviewed-by: Robin Murphy <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Jun 6, 2024
[ Upstream commit 1ff05e7 ]

While validating node ids in map_benchmark_ioctl(), node_possible() may
be provided with invalid argument outside of [0,MAX_NUMNODES-1] range
leading to:

BUG: KASAN: wild-memory-access in map_benchmark_ioctl (kernel/dma/map_benchmark.c:214)
Read of size 8 at addr 1fffffff8ccb6398 by task dma_map_benchma/971
CPU: 7 PID: 971 Comm: dma_map_benchma Not tainted 6.9.0-rc6 torvalds#37
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
Call Trace:
 <TASK>
dump_stack_lvl (lib/dump_stack.c:117)
kasan_report (mm/kasan/report.c:603)
kasan_check_range (mm/kasan/generic.c:189)
variable_test_bit (arch/x86/include/asm/bitops.h:227) [inline]
arch_test_bit (arch/x86/include/asm/bitops.h:239) [inline]
_test_bit at (include/asm-generic/bitops/instrumented-non-atomic.h:142) [inline]
node_state (include/linux/nodemask.h:423) [inline]
map_benchmark_ioctl (kernel/dma/map_benchmark.c:214)
full_proxy_unlocked_ioctl (fs/debugfs/file.c:333)
__x64_sys_ioctl (fs/ioctl.c:890)
do_syscall_64 (arch/x86/entry/common.c:83)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)

Compare node ids with sane bounds first. NUMA_NO_NODE is considered a
special valid case meaning that benchmarking kthreads won't be bound to a
cpuset of a given node.

Found by Linux Verification Center (linuxtesting.org).

Fixes: 65789da ("dma-mapping: add benchmark support for streaming DMA APIs")
Signed-off-by: Fedor Pchelkin <[email protected]>
Reviewed-by: Robin Murphy <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Jun 6, 2024
[ Upstream commit 1ff05e7 ]

While validating node ids in map_benchmark_ioctl(), node_possible() may
be provided with invalid argument outside of [0,MAX_NUMNODES-1] range
leading to:

BUG: KASAN: wild-memory-access in map_benchmark_ioctl (kernel/dma/map_benchmark.c:214)
Read of size 8 at addr 1fffffff8ccb6398 by task dma_map_benchma/971
CPU: 7 PID: 971 Comm: dma_map_benchma Not tainted 6.9.0-rc6 torvalds#37
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
Call Trace:
 <TASK>
dump_stack_lvl (lib/dump_stack.c:117)
kasan_report (mm/kasan/report.c:603)
kasan_check_range (mm/kasan/generic.c:189)
variable_test_bit (arch/x86/include/asm/bitops.h:227) [inline]
arch_test_bit (arch/x86/include/asm/bitops.h:239) [inline]
_test_bit at (include/asm-generic/bitops/instrumented-non-atomic.h:142) [inline]
node_state (include/linux/nodemask.h:423) [inline]
map_benchmark_ioctl (kernel/dma/map_benchmark.c:214)
full_proxy_unlocked_ioctl (fs/debugfs/file.c:333)
__x64_sys_ioctl (fs/ioctl.c:890)
do_syscall_64 (arch/x86/entry/common.c:83)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)

Compare node ids with sane bounds first. NUMA_NO_NODE is considered a
special valid case meaning that benchmarking kthreads won't be bound to a
cpuset of a given node.

Found by Linux Verification Center (linuxtesting.org).

Fixes: 65789da ("dma-mapping: add benchmark support for streaming DMA APIs")
Signed-off-by: Fedor Pchelkin <[email protected]>
Reviewed-by: Robin Murphy <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Jun 9, 2024
[ Upstream commit 1ff05e7 ]

While validating node ids in map_benchmark_ioctl(), node_possible() may
be provided with invalid argument outside of [0,MAX_NUMNODES-1] range
leading to:

BUG: KASAN: wild-memory-access in map_benchmark_ioctl (kernel/dma/map_benchmark.c:214)
Read of size 8 at addr 1fffffff8ccb6398 by task dma_map_benchma/971
CPU: 7 PID: 971 Comm: dma_map_benchma Not tainted 6.9.0-rc6 torvalds#37
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
Call Trace:
 <TASK>
dump_stack_lvl (lib/dump_stack.c:117)
kasan_report (mm/kasan/report.c:603)
kasan_check_range (mm/kasan/generic.c:189)
variable_test_bit (arch/x86/include/asm/bitops.h:227) [inline]
arch_test_bit (arch/x86/include/asm/bitops.h:239) [inline]
_test_bit at (include/asm-generic/bitops/instrumented-non-atomic.h:142) [inline]
node_state (include/linux/nodemask.h:423) [inline]
map_benchmark_ioctl (kernel/dma/map_benchmark.c:214)
full_proxy_unlocked_ioctl (fs/debugfs/file.c:333)
__x64_sys_ioctl (fs/ioctl.c:890)
do_syscall_64 (arch/x86/entry/common.c:83)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)

Compare node ids with sane bounds first. NUMA_NO_NODE is considered a
special valid case meaning that benchmarking kthreads won't be bound to a
cpuset of a given node.

Found by Linux Verification Center (linuxtesting.org).

Fixes: 65789da ("dma-mapping: add benchmark support for streaming DMA APIs")
Signed-off-by: Fedor Pchelkin <[email protected]>
Reviewed-by: Robin Murphy <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Jun 9, 2024
[ Upstream commit 1ff05e7 ]

While validating node ids in map_benchmark_ioctl(), node_possible() may
be provided with invalid argument outside of [0,MAX_NUMNODES-1] range
leading to:

BUG: KASAN: wild-memory-access in map_benchmark_ioctl (kernel/dma/map_benchmark.c:214)
Read of size 8 at addr 1fffffff8ccb6398 by task dma_map_benchma/971
CPU: 7 PID: 971 Comm: dma_map_benchma Not tainted 6.9.0-rc6 torvalds#37
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
Call Trace:
 <TASK>
dump_stack_lvl (lib/dump_stack.c:117)
kasan_report (mm/kasan/report.c:603)
kasan_check_range (mm/kasan/generic.c:189)
variable_test_bit (arch/x86/include/asm/bitops.h:227) [inline]
arch_test_bit (arch/x86/include/asm/bitops.h:239) [inline]
_test_bit at (include/asm-generic/bitops/instrumented-non-atomic.h:142) [inline]
node_state (include/linux/nodemask.h:423) [inline]
map_benchmark_ioctl (kernel/dma/map_benchmark.c:214)
full_proxy_unlocked_ioctl (fs/debugfs/file.c:333)
__x64_sys_ioctl (fs/ioctl.c:890)
do_syscall_64 (arch/x86/entry/common.c:83)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)

Compare node ids with sane bounds first. NUMA_NO_NODE is considered a
special valid case meaning that benchmarking kthreads won't be bound to a
cpuset of a given node.

Found by Linux Verification Center (linuxtesting.org).

Fixes: 65789da ("dma-mapping: add benchmark support for streaming DMA APIs")
Signed-off-by: Fedor Pchelkin <[email protected]>
Reviewed-by: Robin Murphy <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
adeepn pushed a commit to jethome-iot/linux-kernel that referenced this pull request Jun 10, 2024
While validating node ids in map_benchmark_ioctl(), node_possible() may
be provided with invalid argument outside of [0,MAX_NUMNODES-1] range
leading to:

BUG: KASAN: wild-memory-access in map_benchmark_ioctl (kernel/dma/map_benchmark.c:214)
Read of size 8 at addr 1fffffff8ccb6398 by task dma_map_benchma/971
CPU: 7 PID: 971 Comm: dma_map_benchma Not tainted 6.9.0-rc6 torvalds#37
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
Call Trace:
 <TASK>
dump_stack_lvl (lib/dump_stack.c:117)
kasan_report (mm/kasan/report.c:603)
kasan_check_range (mm/kasan/generic.c:189)
variable_test_bit (arch/x86/include/asm/bitops.h:227) [inline]
arch_test_bit (arch/x86/include/asm/bitops.h:239) [inline]
_test_bit at (include/asm-generic/bitops/instrumented-non-atomic.h:142) [inline]
node_state (include/linux/nodemask.h:423) [inline]
map_benchmark_ioctl (kernel/dma/map_benchmark.c:214)
full_proxy_unlocked_ioctl (fs/debugfs/file.c:333)
__x64_sys_ioctl (fs/ioctl.c:890)
do_syscall_64 (arch/x86/entry/common.c:83)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)

Compare node ids with sane bounds first. NUMA_NO_NODE is considered a
special valid case meaning that benchmarking kthreads won't be bound to a
cpuset of a given node.

Found by Linux Verification Center (linuxtesting.org).

Fixes: 65789da ("dma-mapping: add benchmark support for streaming DMA APIs")
Signed-off-by: Fedor Pchelkin <[email protected]>
Reviewed-by: Robin Murphy <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
hdeller pushed a commit to hdeller/linux that referenced this pull request Jun 12, 2024
[ Upstream commit 1ff05e7 ]

While validating node ids in map_benchmark_ioctl(), node_possible() may
be provided with invalid argument outside of [0,MAX_NUMNODES-1] range
leading to:

BUG: KASAN: wild-memory-access in map_benchmark_ioctl (kernel/dma/map_benchmark.c:214)
Read of size 8 at addr 1fffffff8ccb6398 by task dma_map_benchma/971
CPU: 7 PID: 971 Comm: dma_map_benchma Not tainted 6.9.0-rc6 torvalds#37
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
Call Trace:
 <TASK>
dump_stack_lvl (lib/dump_stack.c:117)
kasan_report (mm/kasan/report.c:603)
kasan_check_range (mm/kasan/generic.c:189)
variable_test_bit (arch/x86/include/asm/bitops.h:227) [inline]
arch_test_bit (arch/x86/include/asm/bitops.h:239) [inline]
_test_bit at (include/asm-generic/bitops/instrumented-non-atomic.h:142) [inline]
node_state (include/linux/nodemask.h:423) [inline]
map_benchmark_ioctl (kernel/dma/map_benchmark.c:214)
full_proxy_unlocked_ioctl (fs/debugfs/file.c:333)
__x64_sys_ioctl (fs/ioctl.c:890)
do_syscall_64 (arch/x86/entry/common.c:83)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)

Compare node ids with sane bounds first. NUMA_NO_NODE is considered a
special valid case meaning that benchmarking kthreads won't be bound to a
cpuset of a given node.

Found by Linux Verification Center (linuxtesting.org).

Fixes: 65789da ("dma-mapping: add benchmark support for streaming DMA APIs")
Signed-off-by: Fedor Pchelkin <[email protected]>
Reviewed-by: Robin Murphy <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
staging-kernelci-org pushed a commit to kernelci/linux that referenced this pull request Jun 12, 2024
[ Upstream commit 1ff05e7 ]

While validating node ids in map_benchmark_ioctl(), node_possible() may
be provided with invalid argument outside of [0,MAX_NUMNODES-1] range
leading to:

BUG: KASAN: wild-memory-access in map_benchmark_ioctl (kernel/dma/map_benchmark.c:214)
Read of size 8 at addr 1fffffff8ccb6398 by task dma_map_benchma/971
CPU: 7 PID: 971 Comm: dma_map_benchma Not tainted 6.9.0-rc6 torvalds#37
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
Call Trace:
 <TASK>
dump_stack_lvl (lib/dump_stack.c:117)
kasan_report (mm/kasan/report.c:603)
kasan_check_range (mm/kasan/generic.c:189)
variable_test_bit (arch/x86/include/asm/bitops.h:227) [inline]
arch_test_bit (arch/x86/include/asm/bitops.h:239) [inline]
_test_bit at (include/asm-generic/bitops/instrumented-non-atomic.h:142) [inline]
node_state (include/linux/nodemask.h:423) [inline]
map_benchmark_ioctl (kernel/dma/map_benchmark.c:214)
full_proxy_unlocked_ioctl (fs/debugfs/file.c:333)
__x64_sys_ioctl (fs/ioctl.c:890)
do_syscall_64 (arch/x86/entry/common.c:83)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)

Compare node ids with sane bounds first. NUMA_NO_NODE is considered a
special valid case meaning that benchmarking kthreads won't be bound to a
cpuset of a given node.

Found by Linux Verification Center (linuxtesting.org).

Fixes: 65789da ("dma-mapping: add benchmark support for streaming DMA APIs")
Signed-off-by: Fedor Pchelkin <[email protected]>
Reviewed-by: Robin Murphy <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
staging-kernelci-org pushed a commit to kernelci/linux that referenced this pull request Jun 12, 2024
[ Upstream commit 1ff05e7 ]

While validating node ids in map_benchmark_ioctl(), node_possible() may
be provided with invalid argument outside of [0,MAX_NUMNODES-1] range
leading to:

BUG: KASAN: wild-memory-access in map_benchmark_ioctl (kernel/dma/map_benchmark.c:214)
Read of size 8 at addr 1fffffff8ccb6398 by task dma_map_benchma/971
CPU: 7 PID: 971 Comm: dma_map_benchma Not tainted 6.9.0-rc6 torvalds#37
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
Call Trace:
 <TASK>
dump_stack_lvl (lib/dump_stack.c:117)
kasan_report (mm/kasan/report.c:603)
kasan_check_range (mm/kasan/generic.c:189)
variable_test_bit (arch/x86/include/asm/bitops.h:227) [inline]
arch_test_bit (arch/x86/include/asm/bitops.h:239) [inline]
_test_bit at (include/asm-generic/bitops/instrumented-non-atomic.h:142) [inline]
node_state (include/linux/nodemask.h:423) [inline]
map_benchmark_ioctl (kernel/dma/map_benchmark.c:214)
full_proxy_unlocked_ioctl (fs/debugfs/file.c:333)
__x64_sys_ioctl (fs/ioctl.c:890)
do_syscall_64 (arch/x86/entry/common.c:83)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)

Compare node ids with sane bounds first. NUMA_NO_NODE is considered a
special valid case meaning that benchmarking kthreads won't be bound to a
cpuset of a given node.

Found by Linux Verification Center (linuxtesting.org).

Fixes: 65789da ("dma-mapping: add benchmark support for streaming DMA APIs")
Signed-off-by: Fedor Pchelkin <[email protected]>
Reviewed-by: Robin Murphy <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
john-cabaj pushed a commit to UbuntuAsahi/linux that referenced this pull request Jun 17, 2024
BugLink: https://bugs.launchpad.net/bugs/2065400

[ Upstream commit f7442a6 ]

The mlxbf_gige driver encounters a NULL pointer exception in
mlxbf_gige_open() when kdump is enabled.  The sequence to reproduce
the exception is as follows:
a) enable kdump
b) trigger kdump via "echo c > /proc/sysrq-trigger"
c) kdump kernel executes
d) kdump kernel loads mlxbf_gige module
e) the mlxbf_gige module runs its open() as the
   the "oob_net0" interface is brought up
f) mlxbf_gige module will experience an exception
   during its open(), something like:

     Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
     Mem abort info:
       ESR = 0x0000000086000004
       EC = 0x21: IABT (current EL), IL = 32 bits
       SET = 0, FnV = 0
       EA = 0, S1PTW = 0
       FSC = 0x04: level 0 translation fault
     user pgtable: 4k pages, 48-bit VAs, pgdp=00000000e29a4000
     [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
     Internal error: Oops: 0000000086000004 [#1] SMP
     CPU: 0 PID: 812 Comm: NetworkManager Tainted: G           OE     5.15.0-1035-bluefield torvalds#37-Ubuntu
     Hardware name: https://www.mellanox.com BlueField-3 SmartNIC Main Card/BlueField-3 SmartNIC Main Card, BIOS 4.6.0.13024 Jan 19 2024
     pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
     pc : 0x0
     lr : __napi_poll+0x40/0x230
     sp : ffff800008003e00
     x29: ffff800008003e00 x28: 0000000000000000 x27: 00000000ffffffff
     x26: ffff000066027238 x25: ffff00007cedec00 x24: ffff800008003ec8
     x23: 000000000000012c x22: ffff800008003eb7 x21: 0000000000000000
     x20: 0000000000000001 x19: ffff000066027238 x18: 0000000000000000
     x17: ffff578fcb450000 x16: ffffa870b083c7c0 x15: 0000aaab010441d0
     x14: 0000000000000001 x13: 00726f7272655f65 x12: 6769675f6662786c
     x11: 0000000000000000 x10: 0000000000000000 x9 : ffffa870b0842398
     x8 : 0000000000000004 x7 : fe5a48b9069706ea x6 : 17fdb11fc84ae0d2
     x5 : d94a82549d594f35 x4 : 0000000000000000 x3 : 0000000000400100
     x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000066027238
     Call trace:
      0x0
      net_rx_action+0x178/0x360
      __do_softirq+0x15c/0x428
      __irq_exit_rcu+0xac/0xec
      irq_exit+0x18/0x2c
      handle_domain_irq+0x6c/0xa0
      gic_handle_irq+0xec/0x1b0
      call_on_irq_stack+0x20/0x2c
      do_interrupt_handler+0x5c/0x70
      el1_interrupt+0x30/0x50
      el1h_64_irq_handler+0x18/0x2c
      el1h_64_irq+0x7c/0x80
      __setup_irq+0x4c0/0x950
      request_threaded_irq+0xf4/0x1bc
      mlxbf_gige_request_irqs+0x68/0x110 [mlxbf_gige]
      mlxbf_gige_open+0x5c/0x170 [mlxbf_gige]
      __dev_open+0x100/0x220
      __dev_change_flags+0x16c/0x1f0
      dev_change_flags+0x2c/0x70
      do_setlink+0x220/0xa40
      __rtnl_newlink+0x56c/0x8a0
      rtnl_newlink+0x58/0x84
      rtnetlink_rcv_msg+0x138/0x3c4
      netlink_rcv_skb+0x64/0x130
      rtnetlink_rcv+0x20/0x30
      netlink_unicast+0x2ec/0x360
      netlink_sendmsg+0x278/0x490
      __sock_sendmsg+0x5c/0x6c
      ____sys_sendmsg+0x290/0x2d4
      ___sys_sendmsg+0x84/0xd0
      __sys_sendmsg+0x70/0xd0
      __arm64_sys_sendmsg+0x2c/0x40
      invoke_syscall+0x78/0x100
      el0_svc_common.constprop.0+0x54/0x184
      do_el0_svc+0x30/0xac
      el0_svc+0x48/0x160
      el0t_64_sync_handler+0xa4/0x12c
      el0t_64_sync+0x1a4/0x1a8
     Code: bad PC value
     ---[ end trace 7d1c3f3bf9d81885 ]---
     Kernel panic - not syncing: Oops: Fatal exception in interrupt
     Kernel Offset: 0x2870a7a00000 from 0xffff800008000000
     PHYS_OFFSET: 0x80000000
     CPU features: 0x0,000005c1,a3332a5a
     Memory Limit: none
     ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---

The exception happens because there is a pending RX interrupt before the
call to request_irq(RX IRQ) executes.  Then, the RX IRQ handler fires
immediately after this request_irq() completes. The RX IRQ handler runs
"napi_schedule()" before NAPI is fully initialized via "netif_napi_add()"
and "napi_enable()", both which happen later in the open() logic.

The logic in mlxbf_gige_open() must fully initialize NAPI before any calls
to request_irq() execute.

Fixes: f92e186 ("Add Mellanox BlueField Gigabit Ethernet driver")
Signed-off-by: David Thompson <[email protected]>
Reviewed-by: Asmaa Mnebhi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Manuel Diewald <[email protected]>
Signed-off-by: Roxana Nicolescu <[email protected]>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Jun 19, 2024
commit fb1cf08 upstream.

__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.

This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because  __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 torvalds#37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c

Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.

Fixes: 5fde3db ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <[email protected]>
Cc: [email protected]
Reviewed-by: Alexandre Ghiti <[email protected]>
Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.1715750938.git.namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Jun 19, 2024
commit fb1cf08 upstream.

__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.

This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because  __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 torvalds#37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c

Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.

Fixes: 5fde3db ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <[email protected]>
Cc: [email protected]
Reviewed-by: Alexandre Ghiti <[email protected]>
Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.1715750938.git.namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Kaz205 pushed a commit to Kaz205/linux that referenced this pull request Jun 20, 2024
commit fb1cf08 upstream.

__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.

This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because  __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 torvalds#37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c

Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.

Fixes: 5fde3db ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <[email protected]>
Cc: [email protected]
Reviewed-by: Alexandre Ghiti <[email protected]>
Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.1715750938.git.namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
ptr1337 pushed a commit to CachyOS/linux that referenced this pull request Jun 21, 2024
commit fb1cf08 upstream.

__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.

This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because  __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 torvalds#37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c

Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.

Fixes: 5fde3db ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <[email protected]>
Cc: [email protected]
Reviewed-by: Alexandre Ghiti <[email protected]>
Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.1715750938.git.namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
staging-kernelci-org pushed a commit to kernelci/linux that referenced this pull request Jun 21, 2024
commit fb1cf08 upstream.

__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.

This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because  __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 torvalds#37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c

Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.

Fixes: 5fde3db ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <[email protected]>
Cc: [email protected]
Reviewed-by: Alexandre Ghiti <[email protected]>
Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.1715750938.git.namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
staging-kernelci-org pushed a commit to kernelci/linux that referenced this pull request Jun 21, 2024
commit fb1cf08 upstream.

__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.

This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because  __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 torvalds#37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c

Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.

Fixes: 5fde3db ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <[email protected]>
Cc: [email protected]
Reviewed-by: Alexandre Ghiti <[email protected]>
Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.1715750938.git.namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
staging-kernelci-org pushed a commit to kernelci/linux that referenced this pull request Jul 16, 2024
[ Upstream commit 1ff05e7 ]

While validating node ids in map_benchmark_ioctl(), node_possible() may
be provided with invalid argument outside of [0,MAX_NUMNODES-1] range
leading to:

BUG: KASAN: wild-memory-access in map_benchmark_ioctl (kernel/dma/map_benchmark.c:214)
Read of size 8 at addr 1fffffff8ccb6398 by task dma_map_benchma/971
CPU: 7 PID: 971 Comm: dma_map_benchma Not tainted 6.9.0-rc6 torvalds#37
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
Call Trace:
 <TASK>
dump_stack_lvl (lib/dump_stack.c:117)
kasan_report (mm/kasan/report.c:603)
kasan_check_range (mm/kasan/generic.c:189)
variable_test_bit (arch/x86/include/asm/bitops.h:227) [inline]
arch_test_bit (arch/x86/include/asm/bitops.h:239) [inline]
_test_bit at (include/asm-generic/bitops/instrumented-non-atomic.h:142) [inline]
node_state (include/linux/nodemask.h:423) [inline]
map_benchmark_ioctl (kernel/dma/map_benchmark.c:214)
full_proxy_unlocked_ioctl (fs/debugfs/file.c:333)
__x64_sys_ioctl (fs/ioctl.c:890)
do_syscall_64 (arch/x86/entry/common.c:83)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)

Compare node ids with sane bounds first. NUMA_NO_NODE is considered a
special valid case meaning that benchmarking kthreads won't be bound to a
cpuset of a given node.

Found by Linux Verification Center (linuxtesting.org).

Fixes: 65789da ("dma-mapping: add benchmark support for streaming DMA APIs")
Signed-off-by: Fedor Pchelkin <[email protected]>
Reviewed-by: Robin Murphy <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant