Skip to content
Snippets Groups Projects
This project is mirrored from https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git. Pull mirroring updated .
  1. Apr 27, 2009
  2. Apr 24, 2009
  3. Apr 23, 2009
    • Ingo Molnar's avatar
      locking: clarify kernel-taint warning message · b48ccb09
      Ingo Molnar authored
      
      Andi Kleen reported this message triggering on non-lockdep kernels:
      
         Disabling lockdep due to kernel taint
      
      Clarify the message to say 'lock debugging' - debug_locks_off()
      turns off all things lock debugging, not just lockdep.
      
      [ Impact: change kernel warning message text ]
      
      Reported-by: default avatarAndi Kleen <andi@firstfloor.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b48ccb09
  4. Apr 21, 2009
  5. Apr 19, 2009
    • Rafael J. Wysocki's avatar
      PM/Suspend: Introduce two new platform callbacks to avoid breakage · 6a7c7eaf
      Rafael J. Wysocki authored
      
      Commit 900af0d9 (PM: Change suspend
      code ordering) changed the ordering of suspend code in such a way
      that the platform .prepare() callback is now executed after the
      device drivers' late suspend callbacks have run.  Unfortunately, this
      turns out to break ARM platforms that need to talk via I2C to power
      control devices during the .prepare() callback.
      
      For this reason introduce two new platform suspend callbacks,
      .prepare_late() and .wake(), that will be called just prior to
      disabling non-boot CPUs and right after bringing them back on line,
      respectively, and use them instead of .prepare() and .finish() for
      ACPI suspend.  Make the PM core execute the .prepare() and .finish()
      platform suspend callbacks where they were executed previously (that
      is, right after calling the regular suspend methods provided by
      device drivers and right before executing their regular resume
      methods, respectively).
      
      It is not necessary to make analogous changes to the hibernation
      code and data structures at the moment, because they are only used
      by ACPI platforms.
      
      Signed-off-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      Reported-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Acked-by: default avatarLen Brown <len.brown@intel.com>
      6a7c7eaf
    • Linus Torvalds's avatar
      Remove 'recurse into child resources' logic from 'reserve_region_with_split()' · ff54250a
      Linus Torvalds authored
      
      This function is not actually used right now, since the original use
      case for it was done with insert_resource_expand_to_fit() instead.
      
      However, we now have another usage case that wants to basically do a
      "reserve IO resource, splitting around existing resources", however that
      one doesn't actually want the "recurse into the conflicting resource"
      logic at all.
      
      And since recursing into the conflicting resource was the most complex
      part, and isn't wanted, just remove it.  Maybe we'll some day want both
      versions, but we can just resurrect the logic then.
      
      Tested-by: default avatarYinghai Lu <yinghai@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ff54250a
  6. Apr 17, 2009
    • Peter Zijlstra's avatar
      lockdep: more robust lockdep_map init sequence · c8a25005
      Peter Zijlstra authored
      
      Steven Rostedt reported:
      
      > OK, I think I figured this bug out. This is a lockdep issue with respect
      > to tracepoints.
      >
      > The trace points in lockdep are called all the time. Outside the lockdep
      > logic. But if lockdep were to trigger an error / warning (which this run
      > did) we might be in trouble. For new locks, like the dentry->d_lock, that
      > are created, they will not get a name:
      >
      > void lockdep_init_map(struct lockdep_map *lock, const char *name,
      >                       struct lock_class_key *key, int subclass)
      > {
      >         if (unlikely(!debug_locks))
      >                 return;
      >
      > When a problem is found by lockdep, debug_locks becomes false. Thus we
      > stop allocating names for locks. This dentry->d_lock I had, now has no
      > name. Worse yet, I have CONFIG_DEBUG_VM set, that scrambles non
      > initialized memory. Thus, when the trace point was hit, it had junk for
      > the lock->name, and the machine crashed.
      
      Ah, nice catch. I think we should put at least the name in regardless.
      
      Ensure we at least initialize the trivial entries of the depmap so that
      they can be relied upon, even when lockdep itself decided to pack up and
      go home.
      
      [ Impact: fix lock tracing after lockdep warnings. ]
      
      Reported-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <1239954049.23397.4156.camel@laptop>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      c8a25005
  7. Apr 16, 2009
  8. Apr 15, 2009
  9. Apr 14, 2009
    • Pallipadi, Venkatesh's avatar
      x86, irq: Remove IRQ_DISABLED check in process context IRQ move · 6ec3cfec
      Pallipadi, Venkatesh authored
      As discussed in the thread here:
      
        http://marc.info/?l=linux-kernel&m=123964468521142&w=2
      
      
      
      Eric W. Biederman observed:
      
      > It looks like some additional bugs have slipped in since last I looked.
      >
      > set_irq_affinity does this:
      > ifdef CONFIG_GENERIC_PENDING_IRQ
      >        if (desc->status & IRQ_MOVE_PCNTXT || desc->status & IRQ_DISABLED) {
      >                cpumask_copy(desc->affinity, cpumask);
      >                desc->chip->set_affinity(irq, cpumask);
      >        } else {
      >                desc->status |= IRQ_MOVE_PENDING;
      >                cpumask_copy(desc->pending_mask, cpumask);
      >        }
      > #else
      >
      > That IRQ_DISABLED case is a software state and as such it has nothing to
      > do with how safe it is to move an irq in process context.
      
      [...]
      
      >
      > The only reason we migrate MSIs in interrupt context today is that there
      > wasn't infrastructure for support migration both in interrupt context
      > and outside of it.
      
      Yes. The idea here was to force the MSI migration to happen in process
      context. One of the patches in the series did
      
              disable_irq(dev->irq);
              irq_set_affinity(dev->irq, cpumask_of(dev->cpu));
              enable_irq(dev->irq);
      
      with the above patch adding irq/manage code check for interrupt disabled
      and moving the interrupt in process context.
      
      IIRC, there was no IRQ_MOVE_PCNTXT when we were developing this HPET
      code and we ended up having this ugly hack. IRQ_MOVE_PCNTXT was there
      when we eventually submitted the patch upstream. But, looks like I did a
      blind rebasing instead of using IRQ_MOVE_PCNTXT in hpet MSI code.
      
      Below patch fixes this. i.e., revert commit 932775a4
      and add PCNTXT to HPET MSI setup. Also removes copying of desc->affinity
      in generic code as set_affinity routines are doing it internally.
      
      Reported-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Acked-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: "Li Shaohua" <shaohua.li@intel.com>
      Cc: Gary Hade <garyhade@us.ibm.com>
      Cc: "lcm@us.ibm.com" <lcm@us.ibm.com>
      Cc: suresh.b.siddha@intel.com
      LKML-Reference: <20090413222058.GB8211@linux-os.sc.intel.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      6ec3cfec
    • Paul E. McKenney's avatar
      rcu: Make hierarchical RCU less IPI-happy · ef631b0c
      Paul E. McKenney authored
      
      This patch fixes a hierarchical-RCU performance bug located by Anton
      Blanchard.  The problem stems from a misguided attempt to provide a
      work-around for jiffies-counter failure.  This work-around uses a per-CPU
      n_rcu_pending counter, which is incremented on each call to rcu_pending(),
      which in turn is called from each scheduling-clock interrupt.  Each CPU
      then treats this counter as a surrogate for the jiffies counter, so
      that if the jiffies counter fails to advance, the per-CPU n_rcu_pending
      counter will cause RCU to invoke force_quiescent_state(), which in turn
      will (among other things) send resched IPIs to CPUs that have thus far
      failed to pass through an RCU quiescent state.
      
      Unfortunately, each CPU resets only its own counter after sending a
      batch of IPIs.  This means that the other CPUs will also (needlessly)
      send -another- round of IPIs, for a full N-squared set of IPIs in the
      worst case every three scheduler-clock ticks until the grace period
      finally ends.  It is not reasonable for a given CPU to reset each and
      every n_rcu_pending for all the other CPUs, so this patch instead simply
      disables the jiffies-counter "training wheels", thus eliminating the
      excessive IPIs.
      
      Note that the jiffies-counter IPIs do not have this problem due to
      the fact that the jiffies counter is global, so that the CPU sending
      the IPIs can easily reset things, thus preventing the other CPUs from
      sending redundant IPIs.
      
      Note also that the n_rcu_pending counter remains, as it will continue to
      be used for tracing.  It may also see use to update the jiffies counter,
      should an appropriate kick-the-jiffies-counter API appear.
      
      Located-by: default avatarAnton Blanchard <anton@au1.ibm.com>
      Tested-by: default avatarAnton Blanchard <anton@au1.ibm.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: anton@samba.org
      Cc: akpm@linux-foundation.org
      Cc: dipankar@in.ibm.com
      Cc: manfred@colorfullife.com
      Cc: cl@linux-foundation.org
      Cc: josht@linux.vnet.ibm.com
      Cc: schamp@sgi.com
      Cc: niv@us.ibm.com
      Cc: dvhltc@us.ibm.com
      Cc: ego@in.ibm.com
      Cc: laijs@cn.fujitsu.com
      Cc: rostedt@goodmis.org
      Cc: peterz@infradead.org
      Cc: penberg@cs.helsinki.fi
      Cc: andi@firstfloor.org
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      LKML-Reference: <12396834793575-git-send-email->
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      ef631b0c
    • Zhaolei's avatar
      tracing: Fix branch tracer header · 557055be
      Zhaolei authored
      
      Before patch:
      
        # tracer: branch
        #
        #           TASK-PID    CPU#    TIMESTAMP  FUNCTION
        #              | |       |          |         |
                   <...>-2981  [000] 24008.872738: [  ok  ] trace_irq_handler_exit:irq_event_types.h:41
                   <...>-2981  [000] 24008.872742: [  ok  ] note_interrupt:spurious.c:229
        ...
      
      After patch:
      
        # tracer: branch
        #
        #           TASK-PID    CPU#    TIMESTAMP  CORRECT  FUNC:FILE:LINE
        #              | |       |          |         |       |
                   <...>-2985  [000] 26329.142970: [  ok  ] slab_free:slub.c:1776
                   <...>-2985  [000] 26329.142972: [  ok  ] trace_kmem_cache_free:kmem_event_types.h:191
        ...
      
      Signed-off-by: default avatarZhao Lei <zhaolei@cn.fujitsu.com>
      Acked-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <49E2F19A.3040006@cn.fujitsu.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      557055be
    • Lai Jiangshan's avatar
      tracing, sched: mark get_parent_ip() notrace · 132380a0
      Lai Jiangshan authored
      
      Impact: remove overly redundant tracing entries
      
      When tracer is "function" or "function_graph", way too much
      "get_parent_ip" entries are recorded in ring_buffer.
      
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Acked-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: default avatarSteven Rostedt <srostedt@redhat.com>
      LKML-Reference: <49D458B1.5000703@cn.fujitsu.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      132380a0
  10. Apr 13, 2009
  11. Apr 12, 2009
  12. Apr 11, 2009
    • Linus Torvalds's avatar
      async: Fix module loading async-work regression · d6de2c80
      Linus Torvalds authored
      Several drivers use asynchronous work to do device discovery, and we
      synchronize with them in the compiled-in case before we actually try to
      mount root filesystems etc.
      
      However, when compiled as modules, that synchronization is missing - the
      module loading completes, but the driver hasn't actually finished
      probing for devices, and that means that any user mode that expects to
      use the devices after the 'insmod' is now potentially broken.
      
      We already saw one case of a similar issue in the ACPI battery code,
      where the kernel itself expected the module to be all done, and unmapped
      the init memory - but the async device discovery was still running.
      That got hacked around by just removing the "__init" (see commit
      5d38258e "ACPI battery: fix async boot
      oops"), but the real fix is to just make the module loading wait for all
      async work to be completed.
      
      It will slow down module loading, but since common devices should be
      built in anyway, and since the bug is really annoying and hard to handle
      from user space (and caused several S3 resume regressions), the simple
      fix to wait is the right one.
      
      This fixes at least
      
      	http://bugzilla.kernel.org/show_bug.cgi?id=13063
      
      
      
      but probably a few other bugzilla entries too (12936, for example), and
      is confirmed to fix Rafael's storage driver breakage after resume bug
      report (no bugzilla entry).
      
      We should also be able to now revert that ACPI battery fix.
      
      Reported-and-tested-by: default avatarRafael J. Wysocki <rjw@suse.com>
      Tested-by: default avatarHeinz Diehl <htd@fancy-poultry.org>
      Acked-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d6de2c80
  13. Apr 10, 2009
    • Zhaolei's avatar
      ftrace: Output REC->var instead of __entry->var for trace format · 0462b566
      Zhaolei authored
      
      print fmt: "irq=%d return=%s", __entry->irq, __entry->ret ? \"handled\" : \"unhandled\"
      
      "__entry" should be convert to "REC" by __stringify() macro.
      
      Signed-off-by: default avatarZhao Lei <zhaolei@cn.fujitsu.com>
      Acked-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      LKML-Reference: <49DC679D.2090901@cn.fujitsu.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      0462b566
    • Li Zefan's avatar
      tracing: fix document references · 4d1f4372
      Li Zefan authored
      
      When moving documents to Documentation/trace/, I forgot to
      grep Kconfig to find out those references.
      
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Pekka Paalanen <pq@iki.fi>
      Cc: eduard.munteanu@linux360.ro
      LKML-Reference: <49DE97EF.7080208@cn.fujitsu.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      4d1f4372
    • Lai Jiangshan's avatar
      tracing: fix splice return too large · 93cfb3c9
      Lai Jiangshan authored
      
      I got these from strace:
      
       splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 12288
       splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 12288
       splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 12288
       splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 16384
       splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 8192
       splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 8192
       splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 8192
      
      I wanted to splice_read 4096 bytes, but it returns 8192 or larger.
      
      It is because the return value of tracing_buffers_splice_read()
      does not include "zero out any left over data" bytes.
      
      But tracing_buffers_read() includes these bytes, we make them
      consistent.
      
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      LKML-Reference: <49D46674.9030804@cn.fujitsu.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      93cfb3c9
    • Lai Jiangshan's avatar
      tracing: update file->f_pos when splice(2) it · c7625a55
      Lai Jiangshan authored
      
      Impact: Cleanup
      
      These two lines:
      
      	if (unlikely(*ppos))
      		return -ESPIPE;
      
      in tracing_buffers_splice_read() are not needed, VFS layer
      has disabled seek(2).
      
      We remove these two lines, and then we can update file->f_pos.
      
      And tracing_buffers_read() updates file->f_pos, this fix
      make tracing_buffers_splice_read() updates file->f_pos too.
      
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      LKML-Reference: <49D46670.4010503@cn.fujitsu.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      c7625a55
    • Lai Jiangshan's avatar
      tracing: allocate page when needed · ddd538f3
      Lai Jiangshan authored
      
      Impact: Cleanup
      
      Sometimes, we open trace_pipe_raw, but we don't read(2) it,
      we just splice(2) it, thus, the page is not used.
      
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      LKML-Reference: <49D4666B.4010608@cn.fujitsu.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      ddd538f3
    • Lai Jiangshan's avatar
      tracing: disable seeking for trace_pipe_raw · d1e7e02f
      Lai Jiangshan authored
      
      Impact: disable pread()
      
      We set tracing_buffers_fops.llseek to no_llseek,
      but we can still perform pread() to read this file.
      
      That is not expected.
      
      This fix uses nonseekable_open() to disable it.
      
      tracing_buffers_fops.llseek is still set to no_llseek,
      it mark this file is a "non-seekable device" and is used by
      sys_splice(). See also do_splice() or manual of splice(2):
      
      ERRORS
             EINVAL Target file system doesn't support  splicing;
                    neither  of the descriptors refers to a pipe;
                    or offset given for non-seekable device.
      
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      LKML-Reference: <49D46668.8030806@cn.fujitsu.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      d1e7e02f
  14. Apr 09, 2009
    • Heiko Carstens's avatar
      mutex: have non-spinning mutexes on s390 by default · 36cd3c9f
      Heiko Carstens authored
      
      Impact: performance regression fix for s390
      
      The adaptive spinning mutexes will not always do what one would expect on
      virtualized architectures like s390. Especially the cpu_relax() loop in
      mutex_spin_on_owner might hurt if the mutex holding cpu has been scheduled
      away by the hypervisor.
      
      We would end up in a cpu_relax() loop when there is no chance that the
      state of the mutex changes until the target cpu has been scheduled again by
      the hypervisor.
      
      For that reason we should change the default behaviour to no-spin on s390.
      
      We do have an instruction which allows to yield the current cpu in favour of
      a different target cpu. Also we have an instruction which allows us to figure
      out if the target cpu is physically backed.
      
      However we need to do some performance tests until we can come up with
      a solution that will do the right thing on s390.
      
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      LKML-Reference: <20090409184834.7a0df7b2@osiris.boeblingen.de.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      36cd3c9f
    • Li Zefan's avatar
      blktrace: pass the right pointer to kfree() · 9eb85125
      Li Zefan authored
      
      Impact: fix kfree crash with non-standard act_mask string
      
      If passing a string with leading white spaces to strstrip(),
      the returned ptr != the original ptr.
      
      This bug was introduced by me.
      
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <49DD694C.8020902@cn.fujitsu.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      9eb85125
    • Frederic Weisbecker's avatar
      tracing/syscalls: use a dedicated file header · 47788c58
      Frederic Weisbecker authored
      
      Impact: fix build warnings and possibe compat misbehavior on IA64
      
      Building a kernel on ia64 might trigger these ugly build warnings:
      
      CC      arch/ia64/ia32/sys_ia32.o
      In file included from arch/ia64/ia32/sys_ia32.c:55:
      arch/ia64/ia32/ia32priv.h:290:1: warning: "elf_check_arch" redefined
      In file included from include/linux/elf.h:7,
                       from include/linux/module.h:14,
                       from include/linux/ftrace.h:8,
                       from include/linux/syscalls.h:68,
                       from arch/ia64/ia32/sys_ia32.c:18:
      arch/ia64/include/asm/elf.h:19:1: warning: this is the location of the previous definition
      [...]
      
      sys_ia32.c includes linux/syscalls.h which in turn includes linux/ftrace.h
      to import the syscalls tracing prototypes.
      
      But including ftrace.h can pull too much things for a low level file,
      especially on ia64 where the ia32 private headers conflict with higher
      level headers.
      
      Now we isolate the syscall tracing headers in their own lightweight file.
      
      Reported-by: default avatarTony Luck <tony.luck@intel.com>
      Tested-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: "Frank Ch. Eigler" <fche@redhat.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Jiaying Zhang <jiayingz@google.com>
      Cc: Michael Rubin <mrubin@google.com>
      Cc: Martin Bligh <mbligh@google.com>
      Cc: Michael Davidson <md@google.com>
      LKML-Reference: <20090408184058.GB6017@nowhere>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      47788c58
    • Andrew Morton's avatar
      work_on_cpu(): rewrite it to create a kernel thread on demand · 6b44003e
      Andrew Morton authored
      
      Impact: circular locking bugfix
      
      The various implemetnations and proposed implemetnations of work_on_cpu()
      are vulnerable to various deadlocks because they all used queues of some
      form.
      
      Unrelated pieces of kernel code thus gained dependencies wherein if one
      work_on_cpu() caller holds a lock which some other work_on_cpu() callback
      also takes, the kernel could rarely deadlock.
      
      Fix this by creating a short-lived kernel thread for each work_on_cpu()
      invokation.
      
      This is not terribly fast, but the only current caller of work_on_cpu() is
      pci_call_probe().
      
      It would be nice to find some other way of doing the node-local
      allocations in the PCI probe code so that we can zap work_on_cpu()
      altogether.  The code there is rather nasty.  I can't think of anything
      simple at this time...
      
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      6b44003e
Loading