Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pull request from torvalds/master (New commits - merge) #3

Merged
merged 9,031 commits into from
Jan 26, 2014

Conversation

AshishNamdev
Copy link
Owner

No description provided.

sshah-solarflare and others added 30 commits January 23, 2014 13:40
I will be taking over the work from Ben Hutchings.

Signed-off-by: Shradha Shah <[email protected]>
Acked-by: Ben Hutchings <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
As part of a workaround for a hardware erratum in the SFC9100 family
(SF bug 35388), the TX_DESC_UPD_DWORD register address is also used
for communicating with the event block, and only descriptor pointer
values < 2048 are valid.

If the TX DMA ring size is increased to 4096 descriptors (which the
firmware still allows) then we may write a descriptor pointer
value >= 2048, which has entirely different and undesirable effects!

Limit the TX DMA ring size correctly when this workaround is in
effect.

Fixes: 8127d66 ('sfc: Add support for Solarflare SFC9100 family')
Signed-off-by: Ben Hutchings <[email protected]>
Signed-off-by: Shradha Shah <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
After the option conversion downdelay and updelay divide a u64
and on a 32 bit this causes the following errors:
ERROR: "__udivdi3" [drivers/net/bonding/bonding.ko] undefined!
ERROR: "__umoddi3" [drivers/net/bonding/bonding.ko] undefined!

Fix it by using a normal int instead because newval->value is capped
at INT_MAX by the way the option is defined.

Signed-off-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
o Use boolean type instead of u8.

Signed-off-by: Sucheta Chakraborty <[email protected]>
Signed-off-by: Jitendra Kalsaria <[email protected]>
Signed-off-by: Himanshu Madhani <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
o Dump each Tx queue details with all descriptors, queue indices
  and Tx queue stats to imporve data colletion in situations
  where Tx timeout occurs.

Signed-off-by: Himanshu Madhani <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
o Added hardware ops for interrupt enable/disable functions

Signed-off-by: Manish Chopra <[email protected]>
Signed-off-by: Shahed Shaikh <[email protected]>
Signed-off-by: Himanshu Madhani <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Add support for MSI/MSI-X mode in poll controller routine.

Signed-off-by: Manish Chopra <[email protected]>
Signed-off-by: Shahed Shaikh <[email protected]>
Signed-off-by: Himanshu Madhani <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
o Refactor configuration of interrupt coalescing parameters for
  all supported adapters.

Signed-off-by: Himanshu Madhani <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
o Refactored MSI-x vector calculation for All adapters.
  Decoupled logic in the code which was using same call to
  request MSI-x vectors in default driver load, as well as
  during set_channel() operation for TSS/RSS. This refactoring
  simplifies code for TSS/RSS code path as well as probe path
  of the driver load for all adapters.

Signed-off-by: Himanshu Madhani <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Himanshu Madhani <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Himanshu Madhani says:

====================
qlcnic: Refactoring and enhancements for all adapters.

This patch series contains following patches.

o Enhanced debug data collection when we are in Tx-timeout situation.
o Enhanced MSI-x vector calculation for defualt load path as well
  as for TSS/RSS ring change path.
o Refactored interrupt coalescing code for all adapters.
o Refactored interrupt handling as well as cleanup of poll controller
  code patch for all adapters.
o changed rx_mac_learn type to boolean.

Please apply to net-next.
====================

Signed-off-by: Zoltan Kiss <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
This check is not needed because the same check is done before
fill_slave_info is used in rtnl_link_slave_info_fill.
Also, by removing this check, kernel will fillup IFLA_INFO_SLAVE_KIND
even for slaves of masters which does not implement fill_slave_info.

Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
  drivers/staging/comedi/drivers/das6402.c: In function 'intr_handler':
  drivers/staging/comedi/drivers/das6402.c:164:3: error: implicit declaration of function 'outw_p' [-Werror=implicit-function-declaration]
  drivers/staging/speakup/speakup_dtlk.c: In function 'synth_probe':
  drivers/staging/speakup/speakup_dtlk.c:362:2: error: implicit declaration of function 'inw_p' [-Werror=implicit-function-declaration]

Signed-off-by: Geert Uytterhoeven <[email protected]>
Cc: Mikael Starvik <[email protected]>
Cc: Jesper Nilsson <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Sort the exception table at build-time rather than during boot.

Microblaze is the same case as AARCH64 that's why EM_MICROBLAZE
conditional check was added to allow cross-compilation on machines which
are not running the latest libc-dev.

Inspired by AARCH64 commit adace89 ("arm64: extable: sort the
exception table at build time").

Signed-off-by: Michal Simek <[email protected]>
Acked-by: David Daney <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Documentation/vm/locking is a blast from the past.  In the entire git
history, it has had precisely Three modifications.  Two of those look to
be pure renames, and the third was from 2005.

The doc contains such gems as:

> The page_table_lock is grabbed while holding the
> kernel_lock spinning monitor.

> Page stealers hold kernel_lock to protect against a bunch of
> races.

Or this which talks about mmap_sem:

> 4. The exception to this rule is expand_stack, which just
>    takes the read lock and the page_table_lock, this is ok
>    because it doesn't really modify fields anybody relies on.

expand_stack() doesn't take any locks any more directly, and the
mmap_sem acquisition was long ago moved up in to the page fault code
itself.

It could be argued that we need to rewrite this, but it is dangerous to
leave it as-is.  It will confuse more people than it helps.

Signed-off-by: Dave Hansen <[email protected]>
Cc: Hugh Dickins <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Wanpeng Li <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
The "compressor" and "enabled" params are currently hidden, this changes
them to read-only, so userspace can tell if zswap is enabled or not and
see what compressor is in use.

Signed-off-by: Dan Streetman <[email protected]>
Cc: Vladimir Murzin <[email protected]>
Cc: Bob Liu <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Weijie Yang <[email protected]>
Acked-by: Seth Jennings <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
bad_page() is cool in that it prints out a bunch of data about the page.
But, I can never remember which page flags are good and which are bad,
or whether ->index or ->mapping is required to be NULL.

This patch allows bad/dump_page() callers to specify a string about why
they are dumping the page and adds explanation strings to a number of
places.  It also adds a 'bad_flags' argument to bad_page(), which it
then dumps out separately from the flags which are actually set.

This way, the messages will show specifically why the page was bad,
*specifically* which flags it is complaining about, if it was a page
flag combination which was the problem.

[[email protected]: switch to pr_alert]
Signed-off-by: Dave Hansen <[email protected]>
Reviewed-by: Christoph Lameter <[email protected]>
Cc: Andi Kleen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Since commit ff6a6da ("mm: accelerate munlock() treatment of THP
pages") munlock skips tail pages of a munlocked THP page.  There is some
attempt to prevent bad consequences of racing with a THP page split, but
code inspection indicates that there are two problems that may lead to a
non-fatal, yet wrong outcome.

First, __split_huge_page_refcount() copies flags including PageMlocked
from the head page to the tail pages.  Clearing PageMlocked by
munlock_vma_page() in the middle of this operation might result in part
of tail pages left with PageMlocked flag.  As the head page still
appears to be a THP page until all tail pages are processed,
munlock_vma_page() might think it munlocked the whole THP page and skip
all the former tail pages.  Before ff6a6da, those pages would be
cleared in further iterations of munlock_vma_pages_range(), but NR_MLOCK
would still become undercounted (related the next point).

Second, NR_MLOCK accounting is based on call to hpage_nr_pages() after
the PageMlocked is cleared.  The accounting might also become
inconsistent due to race with __split_huge_page_refcount()

- undercount when HUGE_PMD_NR is subtracted, but some tail pages are
  left with PageMlocked set and counted again (only possible before
  ff6a6da)

- overcount when hpage_nr_pages() sees a normal page (split has already
  finished), but the parallel split has meanwhile cleared PageMlocked from
  additional tail pages

This patch prevents both problems via extending the scope of lru_lock in
munlock_vma_page().  This is convenient because:

- __split_huge_page_refcount() takes lru_lock for its whole operation

- munlock_vma_page() typically takes lru_lock anyway for page isolation

As this becomes a second function where page isolation is done with
lru_lock already held, factor this out to a new
__munlock_isolate_lru_page() function and clean up the code around.

[[email protected]: avoid a coding-style ugly]
Signed-off-by: Vlastimil Babka <[email protected]>
Cc: Sasha Levin <[email protected]>
Cc: Michel Lespinasse <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
The vmalloc was introduced by 3332794 ("memcgroup: use vmalloc for
mem_cgroup allocation"), because at that time MAX_NUMNODES was used for
defining the per-node array in the mem_cgroup structure so that the
structure could be huge even if the system had the only NUMA node.

The situation was significantly improved by commit 45cf7eb ("memcg:
reduce the size of struct memcg 244-fold"), which made the size of the
mem_cgroup structure calculated dynamically depending on the real number
of NUMA nodes installed on the system (nr_node_ids), so now there is no
point in using vmalloc here: the structure is allocated rarely and on
most systems its size is about 1K.

Signed-off-by: Vladimir Davydov <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Glauber Costa <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
stable_page_flags() checks !PageHuge && PageTransCompound && PageLRU to
know that a specified page is thp or not.  But sometimes it's not enough
and we fail to detect thp when the thp is on pagevec.  This happens only
for a few seconds after LRU list operations, but it makes it difficult
to control our applications depending on this flag.

So this patch adds another check PageAnon to detect thps on pagevec.  It
might not give the future extensibility for thp pagecache, but it's OK
at least for now.

Signed-off-by: Naoya Horiguchi <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Wu Fengguang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Most of the VM_BUG_ON assertions are performed on a page.  Usually, when
one of these assertions fails we'll get a BUG_ON with a call stack and
the registers.

I've recently noticed based on the requests to add a small piece of code
that dumps the page to various VM_BUG_ON sites that the page dump is
quite useful to people debugging issues in mm.

This patch adds a VM_BUG_ON_PAGE(cond, page) which beyond doing what
VM_BUG_ON() does, also dumps the page before executing the actual
BUG_ON.

[[email protected]: fix up includes]
Signed-off-by: Sasha Levin <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Currently kmem_cache_create_memcg() backoffs on failure inside
conditionals, without using gotos.  This results in the rollback code
duplication, which makes the function look cumbersome even though on
error we should only free the allocated cache.  Since in the next patch
I am going to add yet another rollback function call on error path
there, let's employ labels instead of conditionals for undoing any
changes on failure to keep things clean.

Signed-off-by: Vladimir Davydov <[email protected]>
Reviewed-by: Pekka Enberg <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Glauber Costa <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Christoph Lameter <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
We do not free the cache's memcg_params if __kmem_cache_create fails.
Fix this.

Plus, rename memcg_register_cache() to memcg_alloc_cache_params(),
because it actually does not register the cache anywhere, but simply
initialize kmem_cache::memcg_params.

[[email protected]: fix build]
Signed-off-by: Vladimir Davydov <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Glauber Costa <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Christoph Lameter <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Currently, we have rather a messy function set relating to per-memcg
kmem cache initialization/destruction.

Per-memcg caches are created in memcg_create_kmem_cache().  This
function calls kmem_cache_create_memcg() to allocate and initialize a
kmem cache and then "registers" the new cache in the
memcg_params::memcg_caches array of the parent cache.

During its work-flow, kmem_cache_create_memcg() executes the following
memcg-related functions:

 - memcg_alloc_cache_params(), to initialize memcg_params of the newly
   created cache;
 - memcg_cache_list_add(), to add the new cache to the memcg_slab_caches
   list.

On the other hand, kmem_cache_destroy() called on a cache destruction
only calls memcg_release_cache(), which does all the work: it cleans the
reference to the cache in its parent's memcg_params::memcg_caches,
removes the cache from the memcg_slab_caches list, and frees
memcg_params.

Such an inconsistency between destruction and initialization paths make
the code difficult to read, so let's clean this up a bit.

This patch moves all the code relating to registration of per-memcg
caches (adding to memcg list, setting the pointer to a cache from its
parent) to the newly created memcg_register_cache() and
memcg_unregister_cache() functions making the initialization and
destruction paths look symmetrical.

Signed-off-by: Vladimir Davydov <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Glauber Costa <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Christoph Lameter <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Each root kmem_cache has pointers to per-memcg caches stored in its
memcg_params::memcg_caches array.  Whenever we want to allocate a slab
for a memcg, we access this array to get per-memcg cache to allocate
from (see memcg_kmem_get_cache()).  The access must be lock-free for
performance reasons, so we should use barriers to assert the kmem_cache
is up-to-date.

First, we should place a write barrier immediately before setting the
pointer to it in the memcg_caches array in order to make sure nobody
will see a partially initialized object.  Second, we should issue a read
barrier before dereferencing the pointer to conform to the write
barrier.

However, currently the barrier usage looks rather strange.  We have a
write barrier *after* setting the pointer and a read barrier *before*
reading the pointer, which is incorrect.  This patch fixes this.

Signed-off-by: Vladimir Davydov <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Glauber Costa <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Christoph Lameter <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
All caches of the same memory cgroup are linked in the memcg_slab_caches
list via kmem_cache::memcg_params::list.  This list is traversed, for
example, when we read memory.kmem.slabinfo.

Since the list actually consists of memcg_cache_params objects, we have
to convert an element of the list to a kmem_cache object using
memcg_params_to_cache(), which obtains the pointer to the cache from the
memcg_params::memcg_caches array of the corresponding root cache.  That
said the pointer to a kmem_cache in its parent's memcg_params must be
initialized before adding the cache to the list, and cleared only after
it has been unlinked.  Currently it is vice-versa, which can result in a
NULL ptr dereference while traversing the memcg_slab_caches list.  This
patch restores the correct order.

Signed-off-by: Vladimir Davydov <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Glauber Costa <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
We obtain a per-memcg cache from a root kmem_cache by dereferencing an
entry of the root cache's memcg_params::memcg_caches array.  If we find
no cache for a memcg there on allocation, we initiate the memcg cache
creation (see memcg_kmem_get_cache()).  The cache creation proceeds
asynchronously in memcg_create_kmem_cache() in order to avoid lock
clashes, so there can be several threads trying to create the same
kmem_cache concurrently, but only one of them may succeed.  However, due
to a race in the code, it is not always true.  The point is that the
memcg_caches array can be relocated when we activate kmem accounting for
a memcg (see memcg_update_all_caches(), memcg_update_cache_size()).  If
memcg_update_cache_size() and memcg_create_kmem_cache() proceed
concurrently as described below, we can leak a kmem_cache.

Asume two threads schedule creation of the same kmem_cache.  One of them
successfully creates it.  Another one should fail then, but if
memcg_create_kmem_cache() interleaves with memcg_update_cache_size() as
follows, it won't:

  memcg_create_kmem_cache()             memcg_update_cache_size()
  (called w/o mutexes held)             (called with slab_mutex,
                                         set_limit_mutex held)
  -------------------------             -------------------------

  mutex_lock(&memcg_cache_mutex)

                                        s->memcg_params=kzalloc(...)

  new_cachep=cache_from_memcg_idx(cachep,idx)
  // new_cachep==NULL => proceed to creation

                                        s->memcg_params->memcg_caches[i]
                                            =cur_params->memcg_caches[i]

  // kmem_cache_create_memcg takes slab_mutex
  // so we will hang around until
  // memcg_update_cache_size finishes, but
  // nothing will prevent it from succeeding so
  // memcg_caches[idx] will be overwritten in
  // memcg_register_cache!

  new_cachep = kmem_cache_create_memcg(...)
  mutex_unlock(&memcg_cache_mutex)

Let's fix this by moving the check for existence of the memcg cache to
kmem_cache_create_memcg() to be called under the slab_mutex and make it
return NULL if so.

A similar race is possible when destroying a memcg cache (see
kmem_cache_destroy()).  Since memcg_unregister_cache(), which clears the
pointer in the memcg_caches array, is called w/o protection, we can race
with memcg_update_cache_size() and omit clearing the pointer.  Therefore
memcg_unregister_cache() should be moved before we release the
slab_mutex.

Signed-off-by: Vladimir Davydov <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Glauber Costa <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Christoph Lameter <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
kmem_cache_dup() is only called from memcg_create_kmem_cache().  The
latter, in fact, does nothing besides this, so let's fold
kmem_cache_dup() into memcg_create_kmem_cache().

This patch also makes the memcg_cache_mutex private to
memcg_create_kmem_cache(), because it is not used anywhere else.

Signed-off-by: Vladimir Davydov <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Glauber Costa <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
There is no point in flooding logs with warnings or especially crashing
the system if we fail to create a cache for a memcg.  In this case we
will be accounting the memcg allocation to the root cgroup until we
succeed to create its own cache, but it isn't that critical.

Signed-off-by: Vladimir Davydov <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Glauber Costa <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Christoph Lameter <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
torvalds and others added 8 commits January 25, 2014 13:18
…/git/broonie/regmap

Pull regmap updates from Mark Brown:
 "Nothing terribly exciting with regmap this release, mainly a few small
  extensions to allow more devices to be supported:

   - Allow the bulk I/O APIs to be used with no-bus regmaps
   - Support interrupt controllers with zero ack base
   - Warning and spelling fixes"

* tag 'regmap-v3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: fix a couple of typos
  regmap: Allow regmap_bulk_write() to work for "no-bus" regmaps
  regmap: Allow regmap_bulk_read() to work for "no-bus" regmaps
  regmap: irq: Allow using zero value for ack_base
  regmap: Fix 'ret' would return an uninitialized value
…ernel/git/broonie/regulator

Pull regulator updates from Mark Brown:
 "A respin of the merges in the previous pull request with one extra
  fix.

  A quiet release for the regulator API, quite a large number of small
  improvements all over but other than the addition of new drivers for
  the AS3722 and MAX14577 there is nothing of substantial non-local
  impact"

* tag 'regulator-v3.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (47 commits)
  regulator: pfuze100-regulator: Improve dev_info() message
  regulator: pfuze100-regulator: Fix some checkpatch complaints
  regulator: twl: Fix checkpatch issue
  regulator: core: Fix checkpatch issue
  regulator: anatop-regulator: Remove unneeded memset()
  regulator: s5m8767: Update LDO index in s5m8767-regulator.txt
  regulator: as3722: set enable time for SD0/1/6
  regulator: as3722: detect SD0 low-voltage mode
  regulator: tps62360: Fix up a pointer-integer size mismatch warning
  regulator: anatop-regulator: Remove unneeded kstrdup()
  regulator: act8865: Fix build error when !OF
  regulator: act8865: register all regulators regardless of how many are used
  regulator: wm831x-dcdc: Remove unneeded 'err' label
  regulator: anatop-regulator: Add MODULE_ALIAS()
  regulator: act8865: fix incorrect devm_kzalloc for act8865
  regulator: act8865: Remove set_suspend_[en|dis]able implementation
  regulator: act8865: Remove unneeded regulator_unregister() calls
  regulator: s2mps11: Clean up redundant code
  regulator: tps65910: Simplify setting enable_mask for regulators
  regulator: act8865: add device tree binding doc
  ...
…git/broonie/spi

Pull spi updates from Mark Brown:
 "A respun version of the merges for the pull request previously sent
  with a few additional fixes.  The last two merges were fixed up by
  hand since the branches have moved on and currently have the prior
  merge in them.

  Quite a busy release for the SPI subsystem, mostly in cleanups big and
  small scattered through the stack rather than anything else:

   - New driver for the Broadcom BC63xx HSSPI controller
   - Fix duplicate device registration for ACPI
   - Conversion of s3c64xx to DMAEngine (this pulls in platform and DMA
     changes upon which the transiton depends)
   - Some small optimisations to reduce the amount of time we hold locks
     in the datapath, eliminate some redundant checks and the size of a
     spi_transfer
   - Lots of fixes, cleanups and general enhancements to drivers,
     especially the rspi and Atmel drivers"

* tag 'spi-v3.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (112 commits)
  spi: core: Fix transfer failure when master->transfer_one returns positive value
  spi: Correct set_cs() documentation
  spi: Clarify transfer_one() w.r.t. spi_finalize_current_transfer()
  spi: Spelling s/finised/finished/
  spi: sc18is602: Convert to use bits_per_word_mask
  spi: Remove duplicate code to set default bits_per_word setting
  spi/pxa2xx: fix compilation warning when !CONFIG_PM_SLEEP
  spi: clps711x: Add MODULE_ALIAS to support module auto-loading
  spi: rspi: Add missing clk_disable() calls in error and cleanup paths
  spi: rspi: Spelling s/transmition/transmission/
  spi: rspi: Add support for specifying CPHA/CPOL
  spi/pxa2xx: initialize DMA channels to -1 to prevent inadvertent match
  spi: rspi: Add more QSPI register documentation
  spi: rspi: Add more RSPI register documentation
  spi: rspi: Remove dependency on DMAE for SHMOBILE
  spi/s3c64xx: Correct indentation
  spi: sh: Use spi_sh_clear_bit() instead of open-coded
  spi: bitbang: Grammar s/make to make/to make/
  spi: sh-hspi: Spelling s/recive/receive/
  spi: core: Improve tx/rx_nbits check comments
  ...
This patch proposes to remove the use of the IRQF_DISABLED flag

It's a NOOP since 2.6.35 and it will be removed one day.

Signed-off-by: Michael Opdenacker <[email protected]>
Signed-off-by: Corey Minyard <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Use USEC_PER_SEC instead of 1000000, that making the later bugfix
more clearly.

Signed-off-by: Xie XiuQi <[email protected]>
Signed-off-by: Corey Minyard <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Loading ipmi_si module while bmc is disconnected, we found the timeout
is longer than 5 secs.  Actually it takes about 3 mins and 20
secs.(HZ=250)

error message as below:
  Dec 12 19:08:59 linux kernel: IPMI BT: timeout in RD_WAIT [ ] 1 retries left
  Dec 12 19:08:59 linux kernel: BT: write 4 bytes seq=0x01 03 18 00 01
  [...]
  Dec 12 19:12:19 linux kernel: IPMI BT: timeout in RD_WAIT [ ]
  Dec 12 19:12:19 linux kernel: failed 2 retries, sending error response
  Dec 12 19:12:19 linux kernel: IPMI: BT reset (takes 5 secs)
  Dec 12 19:12:19 linux kernel: IPMI BT: flag reset [ ]

Function wait_for_msg_done() use schedule_timeout_uninterruptible(1) to
sleep 1 tick, so we should subtract jiffies_to_usecs(1) instead of 100
usecs from timeout.

Reported-by: Hu Shiyuan <[email protected]>
Signed-off-by: Xie XiuQi <[email protected]>
Signed-off-by: Corey Minyard <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Return proper errors for a lot of IPMI failure cases.  Also call
pci_disable_device when IPMI PCI devices are removed.

Signed-off-by: Corey Minyard <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Merge ipmi fixes from Corey Minyard:
 "Just some collected fixes for 3.14.  Nothing huge"

* emailed patches from Corey Minyard <[email protected]>:
  ipmi: Cleanup error return
  ipmi: fix timeout calculation when bmc is disconnected
  ipmi: use USEC_PER_SEC instead of 1000000 for more meaningful
  ipmi: remove deprecated IRQF_DISABLED
AshishNamdev added a commit that referenced this pull request Jan 26, 2014
Pull request from torvalds/master  (New commits - merge)
@AshishNamdev AshishNamdev merged commit 8ed87cd into AshishNamdev:master Jan 26, 2014
AshishNamdev pushed a commit that referenced this pull request Feb 22, 2014
This reverts commit f2d9b99.

We are ripping out commit 35773da "usb:
xhci: Link TRB must not occur within a USB payload burst" because it's a
hack that caused regressions in the usb-storage and userspace USB
drivers that use usbfs and libusb.  This commit attempted to fix the
issues with that patch.

Signed-off-by: Sarah Sharp <[email protected]>
Cc: [email protected] #3.12
AshishNamdev pushed a commit that referenced this pull request Feb 22, 2014
Setup cgroupfs like this:
  # mount -t cgroup -o cpuacct xxx /cgroup
  # mkdir /cgroup/sub1
  # mkdir /cgroup/sub2

Then run these two commands:
  # for ((; ;)) { mkdir /cgroup/sub1/tmp && rmdir /mnt/sub1/tmp; } &
  # for ((; ;)) { mkdir /cgroup/sub2/tmp && rmdir /mnt/sub2/tmp; } &

After seconds you may see this warning:

------------[ cut here ]------------
WARNING: CPU: 1 PID: 25243 at lib/idr.c:527 sub_remove+0x87/0x1b0()
idr_remove called for id=6 which is not allocated.
...
Call Trace:
 [<ffffffff8156063c>] dump_stack+0x7a/0x96
 [<ffffffff810591ac>] warn_slowpath_common+0x8c/0xc0
 [<ffffffff81059296>] warn_slowpath_fmt+0x46/0x50
 [<ffffffff81300aa7>] sub_remove+0x87/0x1b0
 [<ffffffff810f3f02>] ? css_killed_work_fn+0x32/0x1b0
 [<ffffffff81300bf5>] idr_remove+0x25/0xd0
 [<ffffffff810f2bab>] cgroup_destroy_css_killed+0x5b/0xc0
 [<ffffffff810f4000>] css_killed_work_fn+0x130/0x1b0
 [<ffffffff8107cdbc>] process_one_work+0x26c/0x550
 [<ffffffff8107eefe>] worker_thread+0x12e/0x3b0
 [<ffffffff81085f96>] kthread+0xe6/0xf0
 [<ffffffff81570bac>] ret_from_fork+0x7c/0xb0
---[ end trace 2d1577ec10cf80d0 ]---

It's because allocating/removing cgroup ID is not properly synchronized.

The bug was introduced when we converted cgroup_ida to cgroup_idr.
While synchronization is already done inside ida_simple_{get,remove}(),
users are responsible for concurrent calls to idr_{alloc,remove}().

tj: Refreshed on top of b58c899 ("cgroup: fix error return from
cgroup_create()").

Fixes: 4e96ee8 ("cgroup: convert cgroup_ida to cgroup_idr")
Cc: <[email protected]> #3.12+
Reported-by: Michal Hocko <[email protected]>
Signed-off-by: Li Zefan <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
AshishNamdev pushed a commit that referenced this pull request Feb 22, 2014
This patch addresses a >= v3.11 free-after-use regression
in core_scsi3_emulate_pro_register() that was introduced
in the following commit:

commit bc118fe
Author: Andy Grover <[email protected]>
Date:   Thu May 16 10:41:04 2013 -0700

    target: Further refactoring of core_scsi3_emulate_pro_register()

To avoid the free-after-use, save an type value before hand, and
only call core_scsi3_put_pr_reg() with a valid *pr_reg.

Reported-by: Dan Carpenter <[email protected]>
Cc: Andy Grover <[email protected]>
Cc: <[email protected]> #3.11+
Signed-off-by: Nicholas Bellinger <[email protected]>
AshishNamdev pushed a commit that referenced this pull request Feb 22, 2014
commit 8b9ade9 coming from Viresh Kumar "tty: serial: sirfsoc: drop
uart_port->lock before calling tty_flip_buffer_push()" broke sirfsoc uart
driver by knic:

	[    5.129122] BUG: spinlock already unlocked on CPU#0, ip6tables/1331
	[    5.132554]  lock: sirfsoc_uart_ports+0x4/0x8a0, .magic: dead4ead,
	.owner: <none>/-1, .owner_cpu: -1
	[    5.141651] CPU: 0 PID: 1331 Comm: ip6tables Tainted: G
	W  O 3.10.16 #3
	[    5.148866] [<c0013528>] (unwind_backtrace+0x0/0xe0) from
	[<c0010e70>] (show_stack+0x10/0x14)
	[    5.157362] [<c0010e70>] (show_stack+0x10/0x14) from
	[<c01a5e68>] (do_raw_spin_unlock+0x40/0xc8)
	[    5.166125] [<c01a5e68>] (do_raw_spin_unlock+0x40/0xc8) from
	[<c03ff8b4>] (_raw_spin_unlock+0x8/0x40)
	[    5.175322] [<c03ff8b4>] (_raw_spin_unlock+0x8/0x40) from
	[<c0203fcc>] (sirfsoc_uart_pio_rx_chars+0xa4/0xc0)
	[    5.185120] [<c0203fcc>]
	(sirfsoc_uart_pio_rx_chars+0xa4/0xc0) from [<c0204fb8>]
	(sirfsoc_rx_tmo_process_tl+0xdc/0x1e0)
	[    5.195875] [<c0204fb8>]
	(sirfsoc_rx_tmo_process_tl+0xdc/0x1e0) from [<c0024b50>]
	(tasklet_action+0x8c/0xec)
	[    5.205673] [<c0024b50>] (tasklet_action+0x8c/0xec) from
	[<c00242a8>] (__do_softirq+0xec/0x1d4)
	[    5.214347] [<c00242a8>] (__do_softirq+0xec/0x1d4) from
	[<c0024428>] (do_softirq+0x48/0x54)
	[    5.222674] [<c0024428>] (do_softirq+0x48/0x54) from
	[<c0024690>] (irq_exit+0x74/0xc0)
	[    5.230573] [<c0024690>] (irq_exit+0x74/0xc0) from
	[<c000e1e8>] (handle_IRQ+0x6c/0x90)
	[    5.238465] [<c000e1e8>] (handle_IRQ+0x6c/0x90) from
	[<c000d500>] (__irq_svc+0x40/0x70)
	[    5.246446] [<c000d500>] (__irq_svc+0x40/0x70) from
	[<c0092e7c>] (mark_page_accessed+0xc/0x68)
	[    5.255034] [<c0092e7c>] (mark_page_accessed+0xc/0x68) from
	[<c00a2a4c>] (unmap_single_vma+0x3bc/0x550)
	[    5.264402] [<c00a2a4c>] (unmap_single_vma+0x3bc/0x550) from
	[<c00a3b4c>] (unmap_vmas+0x44/0x54)
	[    5.273164] [<c00a3b4c>] (unmap_vmas+0x44/0x54) from
	[<c00a81a8>] (exit_mmap+0xc4/0x1e0)
	[    5.281233] [<c00a81a8>] (exit_mmap+0xc4/0x1e0) from
	[<c001bb78>] (mmput+0x3c/0xdc)
	[    5.288868] [<c001bb78>] (mmput+0x3c/0xdc) from [<c0021b0c>]
	(do_exit+0x30c/0x828)
	[    5.296413] [<c0021b0c>] (do_exit+0x30c/0x828) from
	[<c0022dac>] (do_group_exit+0x4c/0xb0)
	[    5.304653] [<c0022dac>] (do_group_exit+0x4c/0xb0) from
	[<c0022e20>] (__wake_up_parent+0x0/0x18)

Root cause:
the commit dropped uart_port->lock before calling tty_flip_buffer_push(), but in sirfsoc-uart,
sirfsoc_uart_pio_rx_chars() can be called by sirfsoc_rx_tmo_process_tl(). here uart_port->lock
has not been taken yet. so that caused unpaired lock/unlock.

Solution:
This patch is doing a quick fix for that, it adds spin_lock/unlock(&port->lock) protect to
sirfsoc_uart_pio_rx_chars() in sirfsoc_rx_tmo_process_tl() to keep spin_lock/unlock in pair.

Signed-off-by: Qipan Li <[email protected]>
Signed-off-by: Barry Song <[email protected]>
Cc: stable <[email protected]> # 3.12
Signed-off-by: Greg Kroah-Hartman <[email protected]>
AshishNamdev pushed a commit that referenced this pull request Mar 23, 2014
A selective retransmission request (SRR) is a fibre-channel
protocol control request which provides support for requesting
retransmission of a data sequence in response to an issue such as
frame loss or corruption.  These events are experienced
infrequently in fibre-channel based networks which makes
it difficult to test and assess codepaths which handle these
events.

We were fortunate enough, for some definition of fortunate, to
have a metro-area single-mode SAN link which, at 10 GBPS
sustained load levels, would consistently generate SRR's in
a SCST based target implementation using our SCST/in-kernel
Qlogic target interface driver.  In response to an SRR the
in-kernel Qlogic target driver immediately panics resulting
in a catastrophic storage failure for serviced initiators.

The culprit was a debug statement in the qla_target.c file which
does not verify that a pointer to the SCSI CDB is not null.
The unchecked pointer dereference results in the kernel panic
and resultant system failure.

The other two references to the SCSI CDB by the SRR handling code
use a ternary operator to verify a non-null pointer is being
acted on.  This patch simply adds a similar test to the implicated
debug statement.

This patch is a candidate for any stable kernel being maintained
since it addresses a potentially catastrophic event with
minimal downside.

Signed-off-by: Dr. Greg Wettstein <[email protected]>
Cc: <[email protected]> #3.5+
Signed-off-by: Nicholas Bellinger <[email protected]>
AshishNamdev pushed a commit that referenced this pull request Mar 23, 2014
The memory allocated for the multiqueue policy's hash table doesn't need
to be physically contiguous.  Use vzalloc() instead of kzalloc().
Fedora has been carrying this fix since 10/10/2013.

Failure seen during creation of a 10TB cached device with a 2048 sector
block size and 411GB cache size:

 dmsetup: page allocation failure: order:9, mode:0x10c0d0
 CPU: 11 PID: 29235 Comm: dmsetup Not tainted 3.10.4 #3
 Hardware name: Supermicro X8DTL/X8DTL, BIOS 2.1a       12/30/2011
  000000000010c0d0 ffff880090941898 ffffffff81387ab4 ffff880090941928
  ffffffff810bb26f 0000000000000009 000000000010c0d0 ffff880090941928
  ffffffff81385dbc ffffffff815f3840 ffffffff00000000 000002000010c0d0
 Call Trace:
  [<ffffffff81387ab4>] dump_stack+0x19/0x1b
  [<ffffffff810bb26f>] warn_alloc_failed+0x110/0x124
  [<ffffffff81385dbc>] ? __alloc_pages_direct_compact+0x17c/0x18e
  [<ffffffff810bda2e>] __alloc_pages_nodemask+0x6c7/0x75e
  [<ffffffff810bdad7>] __get_free_pages+0x12/0x3f
  [<ffffffff810ea148>] kmalloc_order_trace+0x29/0x88
  [<ffffffff810ec1fd>] __kmalloc+0x36/0x11b
  [<ffffffffa031eeed>] ? mq_create+0x1dc/0x2cf [dm_cache_mq]
  [<ffffffffa031efc0>] mq_create+0x2af/0x2cf [dm_cache_mq]
  [<ffffffffa0314605>] dm_cache_policy_create+0xa7/0xd2 [dm_cache]
  [<ffffffffa0312530>] ? cache_ctr+0x245/0xa13 [dm_cache]
  [<ffffffffa031263e>] cache_ctr+0x353/0xa13 [dm_cache]
  [<ffffffffa012b916>] dm_table_add_target+0x227/0x2ce [dm_mod]
  [<ffffffffa012e8e4>] table_load+0x286/0x2ac [dm_mod]
  [<ffffffffa012e65e>] ? dev_wait+0x8a/0x8a [dm_mod]
  [<ffffffffa012e324>] ctl_ioctl+0x39a/0x3c2 [dm_mod]
  [<ffffffffa012e35a>] dm_ctl_ioctl+0xe/0x12 [dm_mod]
  [<ffffffff81101181>] vfs_ioctl+0x21/0x34
  [<ffffffff811019d3>] do_vfs_ioctl+0x3b1/0x3f4
  [<ffffffff810f4d2e>] ? ____fput+0x9/0xb
  [<ffffffff81050b6c>] ? task_work_run+0x7e/0x92
  [<ffffffff81101a68>] SyS_ioctl+0x52/0x82
  [<ffffffff81391d92>] system_call_fastpath+0x16/0x1b

Signed-off-by: Heinz Mauelshagen <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
Cc: [email protected]
AshishNamdev pushed a commit that referenced this pull request Mar 23, 2014
[  365.164040] BUG: sleeping function called from invalid context at kernel/rtmutex.c:674
[  365.164041] in_atomic(): 1, irqs_disabled(): 1, pid: 26, name: migration/1
[  365.164043] no locks held by migration/1/26.
[  365.164044] irq event stamp: 6648
[  365.164056] hardirqs last  enabled at (6647): [<ffffffff8153d377>] restore_args+0x0/0x30
[  365.164062] hardirqs last disabled at (6648): [<ffffffff810ed98d>] multi_cpu_stop+0x9d/0x120
[  365.164070] softirqs last  enabled at (0): [<ffffffff810543bc>] copy_process.part.28+0x6fc/0x1920
[  365.164072] softirqs last disabled at (0): [<          (null)>]           (null)
[  365.164076] CPU: 1 PID: 26 Comm: migration/1 Tainted: GF           N  3.12.12-rt19-0.gcb6c4a2-rt #3
[  365.164078] Hardware name: QCI QSSC-S4R/QSSC-S4R, BIOS QSSC-S4R.QCI.01.00.S013.032920111005 03/29/2011
[  365.164091]  0000000000000001 ffff880a42ea7c30 ffffffff815367e6 ffffffff81a086c0
[  365.164099]  ffff880a42ea7c40 ffffffff8108919c ffff880a42ea7c60 ffffffff8153c24f
[  365.164107]  ffff880a42ea91f0 00000000ffffffe1 ffff880a42ea7c88 ffffffff81297ec0
[  365.164108] Call Trace:
[  365.164119]  [<ffffffff810060b1>] try_stack_unwind+0x191/0x1a0
[  365.164127]  [<ffffffff81004872>] dump_trace+0x92/0x360
[  365.164133]  [<ffffffff81006108>] show_trace_log_lvl+0x48/0x60
[  365.164138]  [<ffffffff81004c18>] show_stack_log_lvl+0xd8/0x1d0
[  365.164143]  [<ffffffff81006160>] show_stack+0x20/0x50
[  365.164153]  [<ffffffff815367e6>] dump_stack+0x54/0x9a
[  365.164163]  [<ffffffff8108919c>] __might_sleep+0xfc/0x140
[  365.164173]  [<ffffffff8153c24f>] rt_spin_lock+0x1f/0x70
[  365.164182]  [<ffffffff81297ec0>] blk_mq_main_cpu_notify+0x20/0x70
[  365.164191]  [<ffffffff81540a1c>] notifier_call_chain+0x4c/0x70
[  365.164201]  [<ffffffff81083499>] __raw_notifier_call_chain+0x9/0x10
[  365.164207]  [<ffffffff810567be>] cpu_notify+0x1e/0x40
[  365.164217]  [<ffffffff81525da2>] take_cpu_down+0x22/0x40
[  365.164223]  [<ffffffff810ed9c6>] multi_cpu_stop+0xd6/0x120
[  365.164229]  [<ffffffff810edd97>] cpu_stopper_thread+0xd7/0x1e0
[  365.164235]  [<ffffffff810863a3>] smpboot_thread_fn+0x203/0x380
[  365.164241]  [<ffffffff8107cbf8>] kthread+0xc8/0xd0
[  365.164250]  [<ffffffff8154440c>] ret_from_fork+0x7c/0xb0
[  365.164429] smpboot: CPU 1 is now offline

Signed-off-by: Mike Galbraith <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
AshishNamdev pushed a commit that referenced this pull request Mar 23, 2014
Running fsx on tmpfs with concurrent memhog-swapoff-swapon, lots of

  BUG: sleeping function called from invalid context at kernel/fork.c:606
  in_atomic(): 0, irqs_disabled(): 0, pid: 1394, name: swapoff
  1 lock held by swapoff/1394:
   #0:  (rcu_read_lock){.+.+.+}, at: [<ffffffff812520a1>] radix_tree_locate_item+0x1f/0x2b6

followed by

  ================================================
  [ BUG: lock held when returning to user space! ]
  3.14.0-rc1 #3 Not tainted
  ------------------------------------------------
  swapoff/1394 is leaving the kernel with locks still held!
  1 lock held by swapoff/1394:
   #0:  (rcu_read_lock){.+.+.+}, at: [<ffffffff812520a1>] radix_tree_locate_item+0x1f/0x2b6

after which the system recovered nicely.

Whoops, I long ago forgot the rcu_read_unlock() on one unlikely branch.

Fixes e504f3f ("tmpfs radix_tree: locate_item to speed up swapoff")

Signed-off-by: Hugh Dickins <[email protected]>
Cc: Johannes Weiner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
AshishNamdev pushed a commit that referenced this pull request Mar 23, 2014
This patch fixes a bug in iscsit_get_tpg_from_np() where the
tpg->tpg_state sanity check was looking for TPG_STATE_FREE,
instead of != TPG_STATE_ACTIVE.

The latter is expected during a normal TPG shutdown once the
tpg_state goes into TPG_STATE_INACTIVE in order to reject any
new incoming login attempts.

Cc: <[email protected]> #3.10+
Signed-off-by: Nicholas Bellinger <[email protected]>
AshishNamdev pushed a commit that referenced this pull request Mar 23, 2014
There are a handful of uses of list_empty() for cmd->i_conn_node
within iser-target code that expect to return false once a cmd
has been removed from the per connect list.

This patch changes all uses of list_del -> list_del_init in order
to ensure that list_empty() returns false as expected.

Acked-by: Sagi Grimberg <[email protected]>
Cc: Or Gerlitz <[email protected]>
Cc: <[email protected]> #3.10+
Signed-off-by: Nicholas Bellinger <[email protected]>
AshishNamdev pushed a commit that referenced this pull request Mar 23, 2014
This patch addresses a couple of different hug shutdown issues
related to wait_event() + isert_conn->state.  First, it changes
isert_conn->conn_wait + isert_conn->conn_wait_comp_err from
waitqueues to completions, and sets ISER_CONN_TERMINATING from
within isert_disconnect_work().

Second, it splits isert_free_conn() into isert_wait_conn() that
is called earlier in iscsit_close_connection() to ensure that
all outstanding commands have completed before continuing.

Finally, it breaks isert_cq_comp_err() into seperate TX / RX
related code, and adds logic in isert_cq_rx_comp_err() to wait
for outstanding commands to complete before setting ISER_CONN_DOWN
and calling complete(&isert_conn->conn_wait_comp_err).

Acked-by: Sagi Grimberg <[email protected]>
Cc: Or Gerlitz <[email protected]>
Cc: <[email protected]> #3.10+
Signed-off-by: Nicholas Bellinger <[email protected]>
AshishNamdev pushed a commit that referenced this pull request Mar 23, 2014
This patch fixes the incorrect setting of ->post_send_buf_count
related to RDMA WRITEs + READs where isert_rdma_rw->send_wr_num
was not being taken into account.

This includes incrementing ->post_send_buf_count within
isert_put_datain() + isert_get_dataout(), decrementing within
__isert_send_completion() + isert_response_completion(), and
clearing wr->send_wr_num within isert_completion_rdma_read()

This is necessary because even though IB_SEND_SIGNALED is
not set for RDMA WRITEs + READs, during a QP failure event
the work requests will be returned with exception status
from the TX completion queue.

Acked-by: Sagi Grimberg <[email protected]>
Cc: Or Gerlitz <[email protected]>
Cc: <[email protected]> #3.10+
Signed-off-by: Nicholas Bellinger <[email protected]>
AshishNamdev pushed a commit that referenced this pull request Mar 23, 2014
This patch changes IB_WR_FAST_REG_MR + IB_WR_LOCAL_INV related
work requests to include a ISER_FRWR_LI_WRID value in order to
signal isert_cq_tx_work() that these requests should be ignored.

This is necessary because even though IB_SEND_SIGNALED is not
set for either work request, during a QP failure event the work
requests will be returned with exception status from the TX
completion queue.

v2 changes:
 - Rename ISER_FRWR_LI_WRID -> ISER_FASTREG_LI_WRID (Sagi)

Acked-by: Sagi Grimberg <[email protected]>
Cc: Or Gerlitz <[email protected]>
Cc: <[email protected]> #3.12+
Signed-off-by: Nicholas Bellinger <[email protected]>
AshishNamdev pushed a commit that referenced this pull request Mar 23, 2014
This patch addresses a number of active I/O shutdown issues
related to isert_cmd descriptors being leaked that are part
of a completion interrupt coalescing batch.

This includes adding logic in isert_cq_tx_comp_err() to
drain any associated tx_desc->comp_llnode_batch, as well
as isert_cq_drain_comp_llist() to drain any associated
isert_conn->conn_comp_llist.

Also, set tx_desc->llnode_active in isert_init_send_wr()
in order to determine when work requests need to be skipped
in isert_cq_tx_work() exception path code.

Finally, update isert_init_send_wr() to only allow interrupt
coalescing when ISER_CONN_UP.

Acked-by: Sagi Grimberg <[email protected]>
Cc: Or Gerlitz <[email protected]>
Cc: <[email protected]> #3.13+
Signed-off-by: Nicholas Bellinger <[email protected]>
AshishNamdev pushed a commit that referenced this pull request Mar 23, 2014
commit 655dada causes kernel panic, this patch fixes it.

    [    1.197816] [ffffffee] *pgd=0d7fd821, *pte=00000000, *ppte=00000000
    [    1.204070] Internal error: Oops: 17 [#1] PREEMPT SMP ARM
    [    1.209447] Modules linked in:
    [    1.212490] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.14.0-rc1 #3
    [    1.218737] task: cd03c000 ti: cd040000 task.ti: cd040000
    [    1.224127] PC is at gpiod_lock_as_irq+0xc/0x64
    [    1.228634] LR is at sirfsoc_gpio_irq_startup+0x18/0x44
    [    1.233842] pc : [<c01d3990>]    lr : [<c01d1c38>]    psr: a0000193
    [    1.233842] sp : cd041d30  ip : 00000000  fp : 00000000
    [    1.245296] r10: 00000000  r9 : cd023db4  r8 : 60000113
    [    1.250505] r7 : 0000003e  r6 : cd023dd4  r5 : c06bfa54  r4 : cd023d80
    [    1.257014] r3 : 00000020  r2 : 00000000  r1 : ffffffea  r0 : ffffffea
    [    1.263526] Flags: NzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
    [    1.270903] Control: 10c53c7d  Table: 00004059  DAC: 00000015
    [    1.276631] Process swapper/0 (pid: 1, stack limit = 0xcd040240)
    [    1.282620] Stack: (0xcd041d30 to 0xcd042000)
    [    1.286963] 1d20:                                     cd023d80 c01d1c38 c01d1c20 cd023d80
    [    1.295124] 1d40: 00000001 c0068438 cd023d80 ccb6d880 cd023dd4 c0067044 0000718e c006719c
    [    1.286963] 1d20:                                     cd023d80 c01d1c38 c01d1c20 cd023d80
    [    1.295124] 1d40: 00000001 c0068438 cd023d80 ccb6d880 cd023dd4 c0067044 0000718e c006719c
    [    1.295124] 1d40: 00000001 c0068438 cd023d80 ccb6d880 cd023dd4 c0067044 0000718e c006719c
    [    1.303283] 1d60: 00000800 00000083 ccb6d880 cd023d80 c02b41d8 00000083 0000003e ccb7c41
    [    1.311442] 1d80: 00000000 c00671dc 00000083 0000003e c02b41d8 cd3dd5c0 0000003e ccb7c634
    [    1.319601] 1da0: cd040030 c00672a8 cd3dd5c0 ccb7c41 ccb6d340 ccb7c41 ccb6d340 cd3dd400
    [    1.327760] 1dc0: cd3dd410 c02b4434 ccb7c41 c01265a8 00000001 cd3dd410 c0687108 00000000
    [    1.335919] 1de0: c0687108 00000000 00000000 c0240170 c0240158 cd3dd410 c06c30d0 c023e8bc
    [    1.344079] 1e00: c023e9d4 00000000 cd3dd410 c023e9d4 c0682150 c023cf88 cd003e98 cd2d50c4
    [    1.352238] 1e20: cd3dd410 cd3dd444 c06822f0 c023e768 cd3dd418 cd3dd410 c06822f0 c023de14
    [    1.360397] 1e40: cd3dd418 00000000 cd3dd410 c023c398 cd041e78 cd041ea8 cd3dd400 cd3dd410
    [    1.368556] 1e60: 00000083 00000000 cd3dd400 cd3dd410 00000083 000000c8 c04e00c8 c023fee8
    [    1.376715] 1e80: 00000000 cd041ea8 cd3dd400 00000001 00000083 c024048c c0435ef8 c0434dec
    [    1.384874] 1ea0: c068da58 c04c6d04 c0682150 c0435ef8 ffffffff 00000000 00000000 c068da58
    [    1.393033] 1ec0: 00000020 00000000 00000000 00000000 c05dabb8 00000007 c068d640 c068d640
    [    1.401193] 1ee0: c04c247c c04c249c 00000000 c00088e8 cd004c00 c043bbb8 cd029180 c03812a0
    [    1.409352] 1f00: 00000000 00000000 60000113 c0673728 60000113 c0673728 00000000 00000000
    [    1.417511] 1f20: cd7fce01 c0390a54 00000065 c003a81c c049e8bc 00000007 cd7fce0e 00000007
    [    1.425670] 1f40: 00000000 c05dabb8 00000007 c068d640 c068d640 c04c050c c04e00c8 00000065
    [    1.433829] 1f60: c04e00c0 c04c0c54 00000007 00000007 c04c050c c037d8fc cd03c000 c004322c
    [    1.441988] 1f80: c0662b40 0000d640 c03737c0 00000000 00000000 00000000 00000000 00000000
    [    1.450147] 1fa0: 00000000 c03737cc 00000000 c000e478 00000000 00000000 00000000 00000000
    [    1.458307] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    [    1.466467] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 0002d481 05014092
    [    1.474640] [<c01d3990>] (gpiod_lock_as_irq) from [<c01d1c38>] (sirfsoc_gpio_irq_startup+0x18/0x44)
    [    1.483661] [<c01d1c38>] (sirfsoc_gpio_irq_startup) from [<c0068438>] (irq_startup+0x34/0x6c)
    [    1.492163] [<c0068438>] (irq_startup) from [<c0067044>] (__setup_irq+0x450/0x4b8)
    [    1.499714] [<c0067044>] (__setup_irq) from [<c00671dc>] (request_threaded_irq+0xa8/0x128)
    [    1.507960] [<c00671dc>] (request_threaded_irq) from [<c00672a8>] (request_any_context_irq+0x4c/0x7c)
    [    1.517164] [<c00672a8>] (request_any_context_irq) from [<c02b4434>] (gpio_extcon_probe+0x144/0x1d4)
    [    1.526279] [<c02b4434>] (gpio_extcon_probe) from [<c0240170>] (platform_drv_probe+0x18/0x48)
    [    1.534783] [<c0240170>] (platform_drv_probe) from [<c023e8bc>] (driver_probe_device+0x120/0x238)
    [    1.543641] [<c023e8bc>] (driver_probe_device) from [<c023cf88>] (bus_for_each_drv+0x58/0x8c)
    [    1.552143] [<c023cf88>] (bus_for_each_drv) from [<c023e768>] (device_attach+0x74/0x88)
    [    1.560126] [<c023e768>] (device_attach) from [<c023de14>] (bus_probe_device+0x84/0xa8)
    [    1.568113] [<c023de14>] (bus_probe_device) from [<c023c398>] (device_add+0x440/0x520)
    [    1.576012] [<c023c398>] (device_add) from [<c023fee8>] (platform_device_add+0xb4/0x214)
    [    1.584084] [<c023fee8>] (platform_device_add) from [<c024048c>] (platform_device_register_full+0xb8/0xdc)
    [    1.593719] [<c024048c>] (platform_device_register_full) from [<c04c6d04>] (sirfsoc_init_late+0xec/0xf4)
    [    1.603185] [<c04c6d04>] (sirfsoc_init_late) from [<c04c249c>] (init_machine_late+0x20/0x28)
    [    1.611603] [<c04c249c>] (init_machine_late) from [<c00088e8>] (do_one_initcall+0xf8/0x144)
    [    1.619934] [<c00088e8>] (do_one_initcall) from [<c04c0c54>] (kernel_init_freeable+0x13c/0x1dc)
    [    1.628620] [<c04c0c54>] (kernel_init_freeable) from [<c03737cc>] (kernel_init+0xc/0x118)

Signed-off-by: Barry Song <[email protected]>
Signed-off-by: Linus Walleij <[email protected]>
AshishNamdev pushed a commit that referenced this pull request Mar 23, 2014
This fixes a subtle issue with cache flush which could potentially cause
random userspace crashes because of stale icache lines.

This error crept in when consolidating the cache flush code

Fixes: bd12976 (ARC: cacheflush refactor #3: Unify the {d,i}cache)
Signed-off-by: Vineet Gupta <[email protected]>
Cc: [email protected]
Cc: [email protected]  # 3.13
Cc: [email protected]
Signed-off-by: Linus Torvalds <[email protected]>
AshishNamdev pushed a commit that referenced this pull request Mar 23, 2014
vmxnet3's netpoll driver is incorrectly coded.  It directly calls
vmxnet3_do_poll, which is the driver internal napi poll routine.  As the netpoll
controller method doesn't block real napi polls in any way, there is a potential
for race conditions in which the netpoll controller method and the napi poll
method run concurrently.  The result is data corruption causing panics such as this
one recently observed:
PID: 1371   TASK: ffff88023762caa0  CPU: 1   COMMAND: "rs:main Q:Reg"
 #0 [ffff88023abd5780] machine_kexec at ffffffff81038f3b
 #1 [ffff88023abd57e0] crash_kexec at ffffffff810c5d92
 #2 [ffff88023abd58b0] oops_end at ffffffff8152b570
 #3 [ffff88023abd58e0] die at ffffffff81010e0b
 #4 [ffff88023abd5910] do_trap at ffffffff8152add4
 #5 [ffff88023abd5970] do_invalid_op at ffffffff8100cf95
 #6 [ffff88023abd5a10] invalid_op at ffffffff8100bf9b
    [exception RIP: vmxnet3_rq_rx_complete+1968]
    RIP: ffffffffa00f1e80  RSP: ffff88023abd5ac8  RFLAGS: 00010086
    RAX: 0000000000000000  RBX: ffff88023b5dcee0  RCX: 00000000000000c0
    RDX: 0000000000000000  RSI: 00000000000005f2  RDI: ffff88023b5dcee0
    RBP: ffff88023abd5b48   R8: 0000000000000000   R9: ffff88023a3b6048
    R10: 0000000000000000  R11: 0000000000000002  R12: ffff8802398d4cd8
    R13: ffff88023af35140  R14: ffff88023b60c890  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffff88023abd5b50] vmxnet3_do_poll at ffffffffa00f204a [vmxnet3]
 #8 [ffff88023abd5b80] vmxnet3_netpoll at ffffffffa00f209c [vmxnet3]
 torvalds#9 [ffff88023abd5ba0] netpoll_poll_dev at ffffffff81472bb7

The fix is to do as other drivers do, and have the poll controller call the top
half interrupt handler, which schedules a napi poll properly to recieve frames

Tested by myself, successfully.

Signed-off-by: Neil Horman <[email protected]>
CC: Shreyas Bhatewara <[email protected]>
CC: "VMware, Inc." <[email protected]>
CC: "David S. Miller" <[email protected]>
CC: [email protected]
Reviewed-by: Shreyas N Bhatewara <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
AshishNamdev pushed a commit that referenced this pull request Jul 2, 2014
cxgb4_netdev maybe lead to dead lock, since it uses a spin lock, and be called
in both thread and softirq context, but not disable BH, the lockdep report is
below; In fact, cxgb4_netdev only reads adap_rcu_list with RCU protection, so
not need to hold spin lock again.
	=================================
	[ INFO: inconsistent lock state ]
	3.14.7+ torvalds#24 Tainted: G         C O
	---------------------------------
	inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
	radvd/3794 [HC0[0]:SC1[1]:HE1:SE0] takes:
	 (adap_rcu_lock){+.?...}, at: [<ffffffffa09989ea>] clip_add+0x2c/0x116 [cxgb4]
	{SOFTIRQ-ON-W} state was registered at:
	  [<ffffffff810fca81>] __lock_acquire+0x34a/0xe48
	  [<ffffffff810fd98b>] lock_acquire+0x82/0x9d
	  [<ffffffff815d6ff8>] _raw_spin_lock+0x34/0x43
	  [<ffffffffa09989ea>] clip_add+0x2c/0x116 [cxgb4]
	  [<ffffffffa0998beb>] cxgb4_inet6addr_handler+0x117/0x12c [cxgb4]
	  [<ffffffff815da98b>] notifier_call_chain+0x32/0x5c
	  [<ffffffff815da9f9>] __atomic_notifier_call_chain+0x44/0x6e
	  [<ffffffff815daa32>] atomic_notifier_call_chain+0xf/0x11
	  [<ffffffff815b1356>] inet6addr_notifier_call_chain+0x16/0x18
	  [<ffffffffa01f72e5>] ipv6_add_addr+0x404/0x46e [ipv6]
	  [<ffffffffa01f8df0>] addrconf_add_linklocal+0x5f/0x95 [ipv6]
	  [<ffffffffa01fc3e9>] addrconf_notify+0x632/0x841 [ipv6]
	  [<ffffffff815da98b>] notifier_call_chain+0x32/0x5c
	  [<ffffffff810e09a1>] __raw_notifier_call_chain+0x9/0xb
	  [<ffffffff810e09b2>] raw_notifier_call_chain+0xf/0x11
	  [<ffffffff8151b3b7>] call_netdevice_notifiers_info+0x4e/0x56
	  [<ffffffff8151b3d0>] call_netdevice_notifiers+0x11/0x13
	  [<ffffffff8151c0a6>] netdev_state_change+0x1f/0x38
	  [<ffffffff8152f004>] linkwatch_do_dev+0x3b/0x49
	  [<ffffffff8152f184>] __linkwatch_run_queue+0x10b/0x144
	  [<ffffffff8152f1dd>] linkwatch_event+0x20/0x27
	  [<ffffffff810d7bc0>] process_one_work+0x1cb/0x2ee
	  [<ffffffff810d7e3b>] worker_thread+0x12e/0x1fc
	  [<ffffffff810dd391>] kthread+0xc4/0xcc
	  [<ffffffff815dc48c>] ret_from_fork+0x7c/0xb0
	irq event stamp: 3388
	hardirqs last  enabled at (3388): [<ffffffff810c6c85>]
	__local_bh_enable_ip+0xaa/0xd9
	hardirqs last disabled at (3387): [<ffffffff810c6c2d>]
	__local_bh_enable_ip+0x52/0xd9
	softirqs last  enabled at (3288): [<ffffffffa01f1d5b>]
	rcu_read_unlock_bh+0x0/0x2f [ipv6]
	softirqs last disabled at (3289): [<ffffffff815ddafc>]
	do_softirq_own_stack+0x1c/0x30

	other info that might help us debug this:
	 Possible unsafe locking scenario:

	       CPU0
	       ----
	  lock(adap_rcu_lock);
	  <Interrupt>
	    lock(adap_rcu_lock);

	 *** DEADLOCK ***

	5 locks held by radvd/3794:
	 #0:  (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffffa020b85a>]
	rawv6_sendmsg+0x74b/0xa4d [ipv6]
	 #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff8151ac6b>]
	rcu_lock_acquire+0x0/0x29
	 #2:  (rcu_read_lock){.+.+..}, at: [<ffffffffa01f4cca>]
	rcu_lock_acquire.constprop.16+0x0/0x30 [ipv6]
	 #3:  (rcu_read_lock){.+.+..}, at: [<ffffffff810e09b4>]
	rcu_lock_acquire+0x0/0x29
	 #4:  (rcu_read_lock){.+.+..}, at: [<ffffffffa0998782>]
	rcu_lock_acquire.constprop.40+0x0/0x30 [cxgb4]

	stack backtrace:
	CPU: 7 PID: 3794 Comm: radvd Tainted: G         C O 3.14.7+ torvalds#24
	Hardware name: Supermicro X7DBU/X7DBU, BIOS 6.00 12/03/2007
	 ffffffff81f15990 ffff88012fdc36a8 ffffffff815d0016 0000000000000006
	 ffff8800c80dc2a0 ffff88012fdc3708 ffffffff815cc727 0000000000000001
	 0000000000000001 ffff880100000000 ffffffff81015b02 ffff8800c80dcb58
	Call Trace:
	 <IRQ>  [<ffffffff815d0016>] dump_stack+0x4e/0x71
	 [<ffffffff815cc727>] print_usage_bug+0x1ec/0x1fd
	 [<ffffffff81015b02>] ? save_stack_trace+0x27/0x44
	 [<ffffffff810fbfaa>] ? check_usage_backwards+0xa0/0xa0
	 [<ffffffff810fc640>] mark_lock+0x11b/0x212
	 [<ffffffff810fca0b>] __lock_acquire+0x2d4/0xe48
	 [<ffffffff810fbfaa>] ? check_usage_backwards+0xa0/0xa0
	 [<ffffffff810fbff6>] ? check_usage_forwards+0x4c/0xa6
	 [<ffffffff810c6c8a>] ? __local_bh_enable_ip+0xaf/0xd9
	 [<ffffffff810fd98b>] lock_acquire+0x82/0x9d
	 [<ffffffffa09989ea>] ? clip_add+0x2c/0x116 [cxgb4]
	 [<ffffffffa0998782>] ? rcu_read_unlock+0x23/0x23 [cxgb4]
	 [<ffffffff815d6ff8>] _raw_spin_lock+0x34/0x43
	 [<ffffffffa09989ea>] ? clip_add+0x2c/0x116 [cxgb4]
	 [<ffffffffa09987b0>] ? rcu_lock_acquire.constprop.40+0x2e/0x30 [cxgb4]
	 [<ffffffffa0998782>] ? rcu_read_unlock+0x23/0x23 [cxgb4]
	 [<ffffffffa09989ea>] clip_add+0x2c/0x116 [cxgb4]
	 [<ffffffffa0998beb>] cxgb4_inet6addr_handler+0x117/0x12c [cxgb4]
	 [<ffffffff810fd99d>] ? lock_acquire+0x94/0x9d
	 [<ffffffff810e09b4>] ? raw_notifier_call_chain+0x11/0x11
	 [<ffffffff815da98b>] notifier_call_chain+0x32/0x5c
	 [<ffffffff815da9f9>] __atomic_notifier_call_chain+0x44/0x6e
	 [<ffffffff815daa32>] atomic_notifier_call_chain+0xf/0x11
	 [<ffffffff815b1356>] inet6addr_notifier_call_chain+0x16/0x18
	 [<ffffffffa01f72e5>] ipv6_add_addr+0x404/0x46e [ipv6]
	 [<ffffffff810fde6a>] ? trace_hardirqs_on+0xd/0xf
	 [<ffffffffa01fb634>] addrconf_prefix_rcv+0x385/0x6ea [ipv6]
	 [<ffffffffa0207950>] ndisc_rcv+0x9d3/0xd76 [ipv6]
	 [<ffffffffa020d536>] icmpv6_rcv+0x592/0x67b [ipv6]
	 [<ffffffff810c6c85>] ? __local_bh_enable_ip+0xaa/0xd9
	 [<ffffffff810c6c85>] ? __local_bh_enable_ip+0xaa/0xd9
	 [<ffffffff810fd8dc>] ? lock_release+0x14e/0x17b
	 [<ffffffffa020df97>] ? rcu_read_unlock+0x21/0x23 [ipv6]
	 [<ffffffff8150df52>] ? rcu_read_unlock+0x23/0x23
	 [<ffffffffa01f4ede>] ip6_input_finish+0x1e4/0x2fc [ipv6]
	 [<ffffffffa01f540b>] ip6_input+0x33/0x38 [ipv6]
	 [<ffffffffa01f5557>] ip6_mc_input+0x147/0x160 [ipv6]
	 [<ffffffffa01f4ba3>] ip6_rcv_finish+0x7c/0x81 [ipv6]
	 [<ffffffffa01f5397>] ipv6_rcv+0x3a1/0x3e2 [ipv6]
	 [<ffffffff8151ef96>] __netif_receive_skb_core+0x4ab/0x511
	 [<ffffffff810fdc94>] ? mark_held_locks+0x71/0x99
	 [<ffffffff8151f0c0>] ? process_backlog+0x69/0x15e
	 [<ffffffff8151f045>] __netif_receive_skb+0x49/0x5b
	 [<ffffffff8151f0cf>] process_backlog+0x78/0x15e
	 [<ffffffff8151f571>] ? net_rx_action+0x1a2/0x1cc
	 [<ffffffff8151f47b>] net_rx_action+0xac/0x1cc
	 [<ffffffff810c69b7>] ? __do_softirq+0xad/0x218
	 [<ffffffff810c69ff>] __do_softirq+0xf5/0x218
	 [<ffffffff815ddafc>] do_softirq_own_stack+0x1c/0x30
	 <EOI>  [<ffffffff810c6bb6>] do_softirq+0x38/0x5d
	 [<ffffffffa01f1d5b>] ? ip6_copy_metadata+0x156/0x156 [ipv6]
	 [<ffffffff810c6c78>] __local_bh_enable_ip+0x9d/0xd9
	 [<ffffffffa01f1d88>] rcu_read_unlock_bh+0x2d/0x2f [ipv6]
	 [<ffffffffa01f28b4>] ip6_finish_output2+0x381/0x3d8 [ipv6]
	 [<ffffffffa01f49ef>] ip6_finish_output+0x6e/0x73 [ipv6]
	 [<ffffffffa01f4a70>] ip6_output+0x7c/0xa8 [ipv6]
	 [<ffffffff815b1bfa>] dst_output+0x18/0x1c
	 [<ffffffff815b1c9e>] ip6_local_out+0x1c/0x21
	 [<ffffffffa01f2489>] ip6_push_pending_frames+0x37d/0x427 [ipv6]
	 [<ffffffff81558af8>] ? skb_orphan+0x39/0x39
	 [<ffffffffa020b85a>] ? rawv6_sendmsg+0x74b/0xa4d [ipv6]
	 [<ffffffffa020ba51>] rawv6_sendmsg+0x942/0xa4d [ipv6]
	 [<ffffffff81584cd2>] inet_sendmsg+0x3d/0x66
	 [<ffffffff81508930>] __sock_sendmsg_nosec+0x25/0x27
	 [<ffffffff8150b0d7>] sock_sendmsg+0x5a/0x7b
	 [<ffffffff810fd8dc>] ? lock_release+0x14e/0x17b
	 [<ffffffff8116d756>] ? might_fault+0x9e/0xa5
	 [<ffffffff8116d70d>] ? might_fault+0x55/0xa5
	 [<ffffffff81508cb1>] ? copy_from_user+0x2a/0x2c
	 [<ffffffff8150b70c>] ___sys_sendmsg+0x226/0x2d9
	 [<ffffffff810fcd25>] ? __lock_acquire+0x5ee/0xe48
	 [<ffffffff810fde01>] ? trace_hardirqs_on_caller+0x145/0x1a1
	 [<ffffffff8118efcb>] ? slab_free_hook.isra.71+0x50/0x59
	 [<ffffffff8115c81f>] ? release_pages+0xbc/0x181
	 [<ffffffff810fd99d>] ? lock_acquire+0x94/0x9d
	 [<ffffffff81115e97>] ? read_seqcount_begin.constprop.25+0x73/0x90
	 [<ffffffff8150c408>] __sys_sendmsg+0x3d/0x5b
	 [<ffffffff8150c433>] SyS_sendmsg+0xd/0x19
	 [<ffffffff815dc53d>] system_call_fastpath+0x1a/0x1f

Reported-by: Ben Greear <[email protected]>
Cc: Casey Leedom <[email protected]>
Cc: Hariprasad Shenai <[email protected]>
Signed-off-by: Li RongQing <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Casey Leedom <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
AshishNamdev pushed a commit that referenced this pull request Oct 2, 2014
This reverts commit 8b37e1b.

It's broken as it changes led_blink_set() in a way that it can now sleep
(while synchronously waiting for workqueue to be cancelled). That's a
problem, because it's possible that this function gets called from atomic
context (tpt_trig_timer() takes a readlock and thus disables preemption).

This has been brought up 3 weeks ago already [1] but no proper fix has
materialized, and I keep seeing the problem since 3.17-rc1.

[1] https://lkml.org/lkml/2014/8/16/128

 BUG: sleeping function called from invalid context at kernel/workqueue.c:2650
 in_atomic(): 1, irqs_disabled(): 0, pid: 2335, name: wpa_supplicant
 5 locks held by wpa_supplicant/2335:
  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff814c7c92>] rtnl_lock+0x12/0x20
  #1:  (&wdev->mtx){+.+.+.}, at: [<ffffffffc06e649c>] cfg80211_mgd_wext_siwessid+0x5c/0x180 [cfg80211]
  #2:  (&local->mtx){+.+.+.}, at: [<ffffffffc0817dea>] ieee80211_prep_connection+0x17a/0x9a0 [mac80211]
  #3:  (&local->chanctx_mtx){+.+.+.}, at: [<ffffffffc08081ed>] ieee80211_vif_use_channel+0x5d/0x2a0 [mac80211]
  #4:  (&trig->leddev_list_lock){.+.+..}, at: [<ffffffffc081e68c>] tpt_trig_timer+0xec/0x170 [mac80211]
 CPU: 0 PID: 2335 Comm: wpa_supplicant Not tainted 3.17.0-rc3 #1
 Hardware name: LENOVO 7470BN2/7470BN2, BIOS 6DET38WW (2.02 ) 12/19/2008
  ffff8800360b5a50 ffff8800751f76d8 ffffffff8159e97f ffff8800360b5a30
  ffff8800751f76e8 ffffffff810739a5 ffff8800751f77b0 ffffffff8106862f
  ffffffff810685d0 0aa2209200000000 ffff880000000004 ffff8800361c59d0
 Call Trace:
  [<ffffffff8159e97f>] dump_stack+0x4d/0x66
  [<ffffffff810739a5>] __might_sleep+0xe5/0x120
  [<ffffffff8106862f>] flush_work+0x5f/0x270
  [<ffffffff810685d0>] ? mod_delayed_work_on+0x80/0x80
  [<ffffffff810945ca>] ? mark_held_locks+0x6a/0x90
  [<ffffffff81068a5f>] ? __cancel_work_timer+0x6f/0x100
  [<ffffffff810946ed>] ? trace_hardirqs_on_caller+0xfd/0x1c0
  [<ffffffff81068a6b>] __cancel_work_timer+0x7b/0x100
  [<ffffffff81068b0e>] cancel_delayed_work_sync+0xe/0x10
  [<ffffffff8147cf3b>] led_blink_set+0x1b/0x40
  [<ffffffffc081e6b0>] tpt_trig_timer+0x110/0x170 [mac80211]
  [<ffffffffc081ecdd>] ieee80211_mod_tpt_led_trig+0x9d/0x160 [mac80211]
  [<ffffffffc07e4278>] __ieee80211_recalc_idle+0x98/0x140 [mac80211]
  [<ffffffffc07e59ce>] ieee80211_idle_off+0xe/0x10 [mac80211]
  [<ffffffffc0804e5b>] ieee80211_add_chanctx+0x3b/0x220 [mac80211]
  [<ffffffffc08062e4>] ieee80211_new_chanctx+0x44/0xf0 [mac80211]
  [<ffffffffc080838a>] ieee80211_vif_use_channel+0x1fa/0x2a0 [mac80211]
  [<ffffffffc0817df8>] ieee80211_prep_connection+0x188/0x9a0 [mac80211]
  [<ffffffffc081c246>] ieee80211_mgd_auth+0x256/0x2e0 [mac80211]
  [<ffffffffc07eab33>] ieee80211_auth+0x13/0x20 [mac80211]
  [<ffffffffc06cb006>] cfg80211_mlme_auth+0x106/0x270 [cfg80211]
  [<ffffffffc06ce085>] cfg80211_conn_do_work+0x155/0x3b0 [cfg80211]
  [<ffffffffc06cf670>] cfg80211_connect+0x3f0/0x540 [cfg80211]
  [<ffffffffc06e6148>] cfg80211_mgd_wext_connect+0x158/0x1f0 [cfg80211]
  [<ffffffffc06e651e>] cfg80211_mgd_wext_siwessid+0xde/0x180 [cfg80211]
  [<ffffffffc06e36c0>] ? cfg80211_wext_giwessid+0x50/0x50 [cfg80211]
  [<ffffffffc06e36dd>] cfg80211_wext_siwessid+0x1d/0x40 [cfg80211]
  [<ffffffff81584d0c>] ioctl_standard_iw_point+0x14c/0x3e0
  [<ffffffff810946ed>] ? trace_hardirqs_on_caller+0xfd/0x1c0
  [<ffffffff8158502a>] ioctl_standard_call+0x8a/0xd0
  [<ffffffff81584fa0>] ? ioctl_standard_iw_point+0x3e0/0x3e0
  [<ffffffff81584b76>] wireless_process_ioctl.constprop.10+0xb6/0x100
  [<ffffffff8158521d>] wext_handle_ioctl+0x5d/0xb0
  [<ffffffff814cfb29>] dev_ioctl+0x329/0x620
  [<ffffffff810946ed>] ? trace_hardirqs_on_caller+0xfd/0x1c0
  [<ffffffff8149c7f2>] sock_ioctl+0x142/0x2e0
  [<ffffffff811b0140>] do_vfs_ioctl+0x300/0x520
  [<ffffffff815a67fb>] ? sysret_check+0x1b/0x56
  [<ffffffff810946ed>] ? trace_hardirqs_on_caller+0xfd/0x1c0
  [<ffffffff811b03e1>] SyS_ioctl+0x81/0xa0
  [<ffffffff815a67d6>] system_call_fastpath+0x1a/0x1f
 wlan0: send auth to 00:0b:6b:3c:8c:e4 (try 1/3)
 wlan0: authenticated
 wlan0: associate with 00:0b:6b:3c:8c:e4 (try 1/3)
 wlan0: RX AssocResp from 00:0b:6b:3c:8c:e4 (capab=0x431 status=0 aid=2)
 wlan0: associated
 IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
 cfg80211: Calling CRDA for country: NA
 wlan0: Limiting TX power to 27 (27 - 0) dBm as advertised by 00:0b:6b:3c:8c:e4

 =================================
 [ INFO: inconsistent lock state ]
 3.17.0-rc3 #1 Not tainted
 ---------------------------------
 inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
 swapper/0/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
  ((&(&led_cdev->blink_work)->work)){+.?...}, at: [<ffffffff810685d0>] flush_work+0x0/0x270
 {SOFTIRQ-ON-W} state was registered at:
   [<ffffffff81094dbe>] __lock_acquire+0x30e/0x1a30
   [<ffffffff81096c81>] lock_acquire+0x91/0x110
   [<ffffffff81068608>] flush_work+0x38/0x270
   [<ffffffff81068a6b>] __cancel_work_timer+0x7b/0x100
   [<ffffffff81068b0e>] cancel_delayed_work_sync+0xe/0x10
   [<ffffffff8147cf3b>] led_blink_set+0x1b/0x40
   [<ffffffffc081e6b0>] tpt_trig_timer+0x110/0x170 [mac80211]
   [<ffffffffc081ecdd>] ieee80211_mod_tpt_led_trig+0x9d/0x160 [mac80211]
   [<ffffffffc07e4278>] __ieee80211_recalc_idle+0x98/0x140 [mac80211]
   [<ffffffffc07e59ce>] ieee80211_idle_off+0xe/0x10 [mac80211]
   [<ffffffffc0804e5b>] ieee80211_add_chanctx+0x3b/0x220 [mac80211]
   [<ffffffffc08062e4>] ieee80211_new_chanctx+0x44/0xf0 [mac80211]
   [<ffffffffc080838a>] ieee80211_vif_use_channel+0x1fa/0x2a0 [mac80211]
   [<ffffffffc0817df8>] ieee80211_prep_connection+0x188/0x9a0 [mac80211]
   [<ffffffffc081c246>] ieee80211_mgd_auth+0x256/0x2e0 [mac80211]
   [<ffffffffc07eab33>] ieee80211_auth+0x13/0x20 [mac80211]
   [<ffffffffc06cb006>] cfg80211_mlme_auth+0x106/0x270 [cfg80211]
   [<ffffffffc06ce085>] cfg80211_conn_do_work+0x155/0x3b0 [cfg80211]
   [<ffffffffc06cf670>] cfg80211_connect+0x3f0/0x540 [cfg80211]
   [<ffffffffc06e6148>] cfg80211_mgd_wext_connect+0x158/0x1f0 [cfg80211]
   [<ffffffffc06e651e>] cfg80211_mgd_wext_siwessid+0xde/0x180 [cfg80211]
   [<ffffffffc06e36dd>] cfg80211_wext_siwessid+0x1d/0x40 [cfg80211]
   [<ffffffff81584d0c>] ioctl_standard_iw_point+0x14c/0x3e0
   [<ffffffff8158502a>] ioctl_standard_call+0x8a/0xd0
   [<ffffffff81584b76>] wireless_process_ioctl.constprop.10+0xb6/0x100
   [<ffffffff8158521d>] wext_handle_ioctl+0x5d/0xb0
   [<ffffffff814cfb29>] dev_ioctl+0x329/0x620
   [<ffffffff8149c7f2>] sock_ioctl+0x142/0x2e0
   [<ffffffff811b0140>] do_vfs_ioctl+0x300/0x520
   [<ffffffff811b03e1>] SyS_ioctl+0x81/0xa0
   [<ffffffff815a67d6>] system_call_fastpath+0x1a/0x1f
 irq event stamp: 493416
 hardirqs last  enabled at (493416): [<ffffffff81068a5f>] __cancel_work_timer+0x6f/0x100
 hardirqs last disabled at (493415): [<ffffffff81067e9f>] try_to_grab_pending+0x1f/0x160
 softirqs last  enabled at (493408): [<ffffffff81053ced>] _local_bh_enable+0x1d/0x50
 softirqs last disabled at (493409): [<ffffffff81054c75>] irq_exit+0xa5/0xb0

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock((&(&led_cdev->blink_work)->work));
   <Interrupt>
     lock((&(&led_cdev->blink_work)->work));

  *** DEADLOCK ***

 2 locks held by swapper/0/0:
  #0:  (((&tpt_trig->timer))){+.-...}, at: [<ffffffff810b4c50>] call_timer_fn+0x0/0x180
  #1:  (&trig->leddev_list_lock){.+.?..}, at: [<ffffffffc081e68c>] tpt_trig_timer+0xec/0x170 [mac80211]

 stack backtrace:
 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.17.0-rc3 #1
 Hardware name: LENOVO 7470BN2/7470BN2, BIOS 6DET38WW (2.02 ) 12/19/2008
  ffffffff8246eb30 ffff88007c203b00 ffffffff8159e97f ffffffff81a194c0
  ffff88007c203b50 ffffffff81599c29 0000000000000001 ffffffff00000001
  ffff880000000000 0000000000000006 ffffffff81a194c0 ffffffff81093ad0
 Call Trace:
  <IRQ>  [<ffffffff8159e97f>] dump_stack+0x4d/0x66
  [<ffffffff81599c29>] print_usage_bug+0x1f4/0x205
  [<ffffffff81093ad0>] ? check_usage_backwards+0x140/0x140
  [<ffffffff810944d3>] mark_lock+0x223/0x2b0
  [<ffffffff81094d60>] __lock_acquire+0x2b0/0x1a30
  [<ffffffff81096c81>] lock_acquire+0x91/0x110
  [<ffffffff810685d0>] ? mod_delayed_work_on+0x80/0x80
  [<ffffffffc081e5a0>] ? __ieee80211_get_rx_led_name+0x10/0x10 [mac80211]
  [<ffffffff81068608>] flush_work+0x38/0x270
  [<ffffffff810685d0>] ? mod_delayed_work_on+0x80/0x80
  [<ffffffff810945ca>] ? mark_held_locks+0x6a/0x90
  [<ffffffff81068a5f>] ? __cancel_work_timer+0x6f/0x100
  [<ffffffffc081e5a0>] ? __ieee80211_get_rx_led_name+0x10/0x10 [mac80211]
  [<ffffffff8109469d>] ? trace_hardirqs_on_caller+0xad/0x1c0
  [<ffffffffc081e5a0>] ? __ieee80211_get_rx_led_name+0x10/0x10 [mac80211]
  [<ffffffff81068a6b>] __cancel_work_timer+0x7b/0x100
  [<ffffffff81068b0e>] cancel_delayed_work_sync+0xe/0x10
  [<ffffffff8147cf3b>] led_blink_set+0x1b/0x40
  [<ffffffffc081e6b0>] tpt_trig_timer+0x110/0x170 [mac80211]
  [<ffffffff810b4cc5>] call_timer_fn+0x75/0x180
  [<ffffffff810b4c50>] ? process_timeout+0x10/0x10
  [<ffffffffc081e5a0>] ? __ieee80211_get_rx_led_name+0x10/0x10 [mac80211]
  [<ffffffff810b50ac>] run_timer_softirq+0x1fc/0x2f0
  [<ffffffff81054805>] __do_softirq+0x115/0x2e0
  [<ffffffff81054c75>] irq_exit+0xa5/0xb0
  [<ffffffff810049b3>] do_IRQ+0x53/0xf0
  [<ffffffff815a74af>] common_interrupt+0x6f/0x6f
  <EOI>  [<ffffffff8147b56e>] ? cpuidle_enter_state+0x6e/0x180
  [<ffffffff8147b732>] cpuidle_enter+0x12/0x20
  [<ffffffff8108bba0>] cpu_startup_entry+0x330/0x360
  [<ffffffff8158fb51>] rest_init+0xc1/0xd0
  [<ffffffff8158fa90>] ? csum_partial_copy_generic+0x170/0x170
  [<ffffffff81af3ff2>] start_kernel+0x44f/0x45a
  [<ffffffff81af399c>] ? set_init_arg+0x53/0x53
  [<ffffffff81af35ad>] x86_64_start_reservations+0x2a/0x2c
  [<ffffffff81af36a0>] x86_64_start_kernel+0xf1/0xf4

Cc: Vincent Donnefort <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Tejun Heo <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
Signed-off-by: Bryan Wu <[email protected]>
AshishNamdev pushed a commit that referenced this pull request Oct 2, 2014
The following bug can be triggered by hot adding and removing a large number of
xen domain0's vcpus repeatedly:

	BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 IP: [..] find_busiest_group
	PGD 5a9d5067 PUD 13067 PMD 0
	Oops: 0000 [#3] SMP
	[...]
	Call Trace:
	load_balance
	? _raw_spin_unlock_irqrestore
	idle_balance
	__schedule
	schedule
	schedule_timeout
	? lock_timer_base
	schedule_timeout_uninterruptible
	msleep
	lock_device_hotplug_sysfs
	online_store
	dev_attr_store
	sysfs_write_file
	vfs_write
	SyS_write
	system_call_fastpath

Last level cache shared mask is built during CPU up and the
build_sched_domain() routine takes advantage of it to setup
the sched domain CPU topology.

However, llc_shared_mask is not released during CPU disable,
which leads to an invalid sched domainCPU topology.

This patch fix it by releasing the llc_shared_mask correctly
during CPU disable.

Yasuaki also reported that this can happen on real hardware:

  https://lkml.org/lkml/2014/7/22/1018

His case is here:

	==
	Here is an example on my system.
	My system has 4 sockets and each socket has 15 cores and HT is
	enabled. In this case, each core of sockes is numbered as
	follows:

		 | CPU#
	Socket#0 | 0-14 , 60-74
	Socket#1 | 15-29, 75-89
	Socket#2 | 30-44, 90-104
	Socket#3 | 45-59, 105-119

	Then llc_shared_mask of CPU#30 has 0x3fff80000001fffc0000000.

	It means that last level cache of Socket#2 is shared with
	CPU#30-44 and 90-104.

	When hot-removing socket#2 and #3, each core of sockets is
	numbered as follows:

		 | CPU#
	Socket#0 | 0-14 , 60-74
	Socket#1 | 15-29, 75-89

	But llc_shared_mask is not cleared. So llc_shared_mask of CPU#30
	remains having 0x3fff80000001fffc0000000.

	After that, when hot-adding socket#2 and #3, each core of
	sockets is numbered as follows:

		 | CPU#
	Socket#0 | 0-14 , 60-74
	Socket#1 | 15-29, 75-89
	Socket#2 | 30-59
	Socket#3 | 90-119

	Then llc_shared_mask of CPU#30 becomes
	0x3fff8000fffffffc0000000. It means that last level cache of
	Socket#2 is shared with CPU#30-59 and 90-104. So the mask has
	the wrong value.

Signed-off-by: Wanpeng Li <[email protected]>
Tested-by: Linn Crosetto <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Reviewed-by: Toshi Kani <[email protected]>
Reviewed-by: Yasuaki Ishimatsu <[email protected]>
Cc: <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Prarit Bhargava <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.