commit 8861fd33e82b43723557cd50f4d4c250cf9a3b0f Author: Greg Kroah-Hartman Date: Thu Jun 20 11:59:06 2013 -0700 Linux 3.4.50 commit 05266fa348c3a30a3fef2c2993ca73e28b17232f Author: Benjamin Herrenschmidt Date: Sat Jun 15 12:13:40 2013 +1000 powerpc: Fix missing/delayed calls to irq_work commit 230b3034793247f61e6a0b08c44cf415f6d92981 upstream. When replaying interrupts (as a result of the interrupt occurring while soft-disabled), in the case of the decrementer, we are exclusively testing for a pending timer target. However we also use decrementer interrupts to trigger the new "irq_work", which in this case would be missed. This change the logic to force a replay in both cases of a timer boundary reached and a decrementer interrupt having actually occurred while disabled. The former test is still useful to catch cases where a CPU having been hard-disabled for a long time completely misses the interrupt due to a decrementer rollover. Signed-off-by: Benjamin Herrenschmidt Tested-by: Steven Rostedt Signed-off-by: Greg Kroah-Hartman commit 0031e5e3e3cd8a21fe1224e0e51d646ca4099eab Author: Michael Ellerman Date: Thu Jun 13 21:04:56 2013 +1000 powerpc: Fix stack overflow crash in resume_kernel when ftracing commit 0e37739b1c96d65e6433998454985de994383019 upstream. It's possible for us to crash when running with ftrace enabled, eg: Bad kernel stack pointer bffffd12 at c00000000000a454 cpu 0x3: Vector: 300 (Data Access) at [c00000000ffe3d40] pc: c00000000000a454: resume_kernel+0x34/0x60 lr: c00000000000335c: performance_monitor_common+0x15c/0x180 sp: bffffd12 msr: 8000000000001032 dar: bffffd12 dsisr: 42000000 If we look at current's stack (paca->__current->stack) we see it is equal to c0000002ecab0000. Our stack is 16K, and comparing to paca->kstack (c0000002ecab3e30) we can see that we have overflowed our kernel stack. This leads to us writing over our struct thread_info, and in this case we have corrupted thread_info->flags and set _TIF_EMULATE_STACK_STORE. Dumping the stack we see: 3:mon> t c0000002ecab0000 [c0000002ecab0000] c00000000002131c .performance_monitor_exception+0x5c/0x70 [c0000002ecab0080] c00000000000335c performance_monitor_common+0x15c/0x180 --- Exception: f01 (Performance Monitor) at c0000000000fb2ec .trace_hardirqs_off+0x1c/0x30 [c0000002ecab0370] c00000000016fdb0 .trace_graph_entry+0xb0/0x280 (unreliable) [c0000002ecab0410] c00000000003d038 .prepare_ftrace_return+0x98/0x130 [c0000002ecab04b0] c00000000000a920 .ftrace_graph_caller+0x14/0x28 [c0000002ecab0520] c0000000000d6b58 .idle_cpu+0x18/0x90 [c0000002ecab05a0] c00000000000a934 .return_to_handler+0x0/0x34 [c0000002ecab0620] c00000000001e660 .timer_interrupt+0x160/0x300 [c0000002ecab06d0] c0000000000025dc decrementer_common+0x15c/0x180 --- Exception: 901 (Decrementer) at c0000000000104d4 .arch_local_irq_restore+0x74/0xa0 [c0000002ecab09c0] c0000000000fe044 .trace_hardirqs_on+0x14/0x30 (unreliable) [c0000002ecab0fb0] c00000000016fe3c .trace_graph_entry+0x13c/0x280 [c0000002ecab1050] c00000000003d038 .prepare_ftrace_return+0x98/0x130 [c0000002ecab10f0] c00000000000a920 .ftrace_graph_caller+0x14/0x28 [c0000002ecab1160] c0000000000161f0 .__ppc64_runlatch_on+0x10/0x40 [c0000002ecab11d0] c00000000000a934 .return_to_handler+0x0/0x34 --- Exception: 901 (Decrementer) at c0000000000104d4 .arch_local_irq_restore+0x74/0xa0 ... and so on __ppc64_runlatch_on() is called from RUNLATCH_ON in the exception entry path. At that point the irq state is not consistent, ie. interrupts are hard disabled (by the exception entry), but the paca soft-enabled flag may be out of sync. This leads to the local_irq_restore() in trace_graph_entry() actually enabling interrupts, which we do not want. Because we have not yet reprogrammed the decrementer we immediately take another decrementer exception, and recurse. The fix is twofold. Firstly make sure we call DISABLE_INTS before calling RUNLATCH_ON. The badly named DISABLE_INTS actually reconciles the irq state in the paca with the hardware, making it safe again to call local_irq_save/restore(). Although that should be sufficient to fix the bug, we also mark the runlatch routines as notrace. They are called very early in the exception entry and we are asking for trouble tracing them. They are also fairly uninteresting and tracing them just adds unnecessary overhead. [ This regression was introduced by fe1952fc0afb9a2e4c79f103c08aef5d13db1873 "powerpc: Rework runlatch code" by myself --BenH ] Signed-off-by: Michael Ellerman Signed-off-by: Benjamin Herrenschmidt Signed-off-by: Greg Kroah-Hartman commit 999899ad5860fb34e679ca2a853074e54fda9d93 Author: Sage Weil Date: Fri Feb 22 15:31:00 2013 -0800 ceph: fix statvfs fr_size commit 92a49fb0f79f3300e6e50ddf56238e70678e4202 upstream. Different versions of glibc are broken in different ways, but the short of it is that for the time being, frsize should == bsize, and be used as the multiple for the blocks, free, and available fields. This mirrors what is done for NFS. The previous reporting of the page size for frsize meant that newer glibc and df would report a very small value for the fs size. Fixes http://tracker.ceph.com/issues/3793. Signed-off-by: Sage Weil Reviewed-by: Greg Farnum Signed-off-by: Greg Kroah-Hartman commit 7f259658b1f320b35040a14d7ace371b5cc15fbb Author: Sage Weil Date: Mon Mar 25 10:26:30 2013 -0700 libceph: wrap auth methods in a mutex commit e9966076cdd952e19f2dd4854cd719be0d7cbebc upstream. The auth code is called from a variety of contexts, include the mon_client (protected by the monc's mutex) and the messenger callbacks (currently protected by nothing). Avoid chaos by protecting all auth state with a mutex. Nothing is blocking, so this should be simple and lightweight. Signed-off-by: Sage Weil Reviewed-by: Alex Elder Signed-off-by: Greg Kroah-Hartman commit aa80dd9dbe86743ae6e52c836f6ab1472c469927 Author: Sage Weil Date: Mon Mar 25 10:26:14 2013 -0700 libceph: wrap auth ops in wrapper functions commit 27859f9773e4a0b2042435b13400ee2c891a61f4 upstream. Use wrapper functions that check whether the auth op exists so that callers do not need a bunch of conditional checks. Simplifies the external interface. Signed-off-by: Sage Weil Reviewed-by: Alex Elder Signed-off-by: Greg Kroah-Hartman commit 29c65a277a64645af853e8c9a9b3dda0ddc421e0 Author: Sage Weil Date: Mon Mar 25 10:26:01 2013 -0700 libceph: add update_authorizer auth method commit 0bed9b5c523d577378b6f83eab5835fe30c27208 upstream. Currently the messenger calls out to a get_authorizer con op, which will create a new authorizer if it doesn't yet have one. In the meantime, when we rotate our service keys, the authorizer doesn't get updated. Eventually it will be rejected by the server on a new connection attempt and get invalidated, and we will then rebuild a new authorizer, but this is not ideal. Instead, if we do have an authorizer, call a new update_authorizer op that will verify that the current authorizer is using the latest secret. If it is not, we will build a new one that does. This avoids the transient failure. This fixes one of the sorry sequence of events for bug http://tracker.ceph.com/issues/4282 Signed-off-by: Sage Weil Reviewed-by: Alex Elder Signed-off-by: Greg Kroah-Hartman commit aacd9c3626bac2960bbecd35cc6f032f8529d90b Author: Sage Weil Date: Mon Mar 25 10:25:49 2013 -0700 libceph: fix authorizer invalidation commit 4b8e8b5d78b8322351d44487c1b76f7e9d3412bc upstream. We were invalidating the authorizer by removing the ticket handler entirely. This was effective in inducing us to request a new authorizer, but in the meantime it mean that any authorizer we generated would get a new and initialized handler with secret_id=0, which would always be rejected by the server side with a confusing error message: auth: could not find secret_id=0 cephx: verify_authorizer could not get service secret for service osd secret_id=0 Instead, simply clear the validity field. This will still induce the auth code to request a new secret, but will let us continue to use the old ticket in the meantime. The messenger code will probably continue to fail, but the exponential backoff will kick in, and eventually the we will get a new (hopefully more valid) ticket from the mon and be able to continue. Signed-off-by: Sage Weil Reviewed-by: Alex Elder Signed-off-by: Greg Kroah-Hartman commit af53bc4db606a4ce179a4cef0f1d2a64276b08bc Author: Sage Weil Date: Mon Mar 25 09:30:13 2013 -0700 libceph: clear messenger auth_retry flag when we authenticate commit 20e55c4cc758e4dccdfd92ae8e9588dd624b2cd7 upstream. We maintain a counter of failed auth attempts to allow us to retry once before failing. However, if the second attempt succeeds, the flag isn't cleared, which makes us think auth failed again later when the connection resets for other reasons (like a socket error). This is one part of the sorry sequence of events in bug http://tracker.ceph.com/issues/4282 Signed-off-by: Sage Weil Reviewed-by: Alex Elder Signed-off-by: Greg Kroah-Hartman commit 5c7a854c966cc6370ab2d4595f7cee5dd02d87c1 Author: Kees Cook Date: Wed Jun 5 11:47:18 2013 -0700 x86: Fix typo in kexec register clearing commit c8a22d19dd238ede87aa0ac4f7dbea8da039b9c1 upstream. Fixes a typo in register clearing code. Thanks to PaX Team for fixing this originally, and James Troup for pointing it out. Signed-off-by: Kees Cook Link: http://lkml.kernel.org/r/20130605184718.GA8396@www.outflux.net Cc: PaX Team Signed-off-by: H. Peter Anvin Signed-off-by: Greg Kroah-Hartman commit f517dfe6d161bf37a9355c86ea5cb605b06d5963 Author: Naoya Horiguchi Date: Wed Jun 12 14:05:04 2013 -0700 mm: migration: add migrate_entry_wait_huge() commit 30dad30922ccc733cfdbfe232090cf674dc374dc upstream. When we have a page fault for the address which is backed by a hugepage under migration, the kernel can't wait correctly and do busy looping on hugepage fault until the migration finishes. As a result, users who try to kick hugepage migration (via soft offlining, for example) occasionally experience long delay or soft lockup. This is because pte_offset_map_lock() can't get a correct migration entry or a correct page table lock for hugepage. This patch introduces migration_entry_wait_huge() to solve this. Signed-off-by: Naoya Horiguchi Reviewed-by: Rik van Riel Reviewed-by: Wanpeng Li Reviewed-by: Michal Hocko Cc: Mel Gorman Cc: Andi Kleen Cc: KOSAKI Motohiro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 0938e135aa8513f9bc379a408d3c6c1fd24eb46a Author: Alex Lyakas Date: Tue Jun 4 20:42:21 2013 +0300 md/raid1: consider WRITE as successful only if at least one non-Faulty and non-rebuilding drive completed it. commit 3056e3aec8d8ba61a0710fb78b2d562600aa2ea7 upstream. Without that fix, the following scenario could happen: - RAID1 with drives A and B; drive B was freshly-added and is rebuilding - Drive A fails - WRITE request arrives to the array. It is failed by drive A, so r1_bio is marked as R1BIO_WriteError, but the rebuilding drive B succeeds in writing it, so the same r1_bio is marked as R1BIO_Uptodate. - r1_bio arrives to handle_write_finished, badblocks are disabled, md_error()->error() does nothing because we don't fail the last drive of raid1 - raid_end_bio_io() calls call_bio_endio() - As a result, in call_bio_endio(): if (!test_bit(R1BIO_Uptodate, &r1_bio->state)) clear_bit(BIO_UPTODATE, &bio->bi_flags); this code doesn't clear the BIO_UPTODATE flag, and the whole master WRITE succeeds, back to the upper layer. So we returned success to the upper layer, even though we had written the data onto the rebuilding drive only. But when we want to read the data back, we would not read from the rebuilding drive, so this data is lost. [neilb - applied identical change to raid10 as well] This bug can result in lost data, so it is suitable for any -stable kernel. Signed-off-by: Alex Lyakas Signed-off-by: NeilBrown Signed-off-by: Greg Kroah-Hartman commit c09c35b2ae5ea7f62b0fd5369935b8e6af25e9cd Author: Rafael Aquini Date: Wed Jun 12 14:04:49 2013 -0700 swap: avoid read_swap_cache_async() race to deadlock while waiting on discard I/O completion commit cbab0e4eec299e9059199ebe6daf48730be46d2b upstream. read_swap_cache_async() can race against get_swap_page(), and stumble across a SWAP_HAS_CACHE entry in the swap map whose page wasn't brought into the swapcache yet. This transient swap_map state is expected to be transitory, but the actual placement of discard at scan_swap_map() inserts a wait for I/O completion thus making the thread at read_swap_cache_async() to loop around its -EEXIST case, while the other end at get_swap_page() is scheduled away at scan_swap_map(). This can leave the system deadlocked if the I/O completion happens to be waiting on the CPU waitqueue where read_swap_cache_async() is busy looping and !CONFIG_PREEMPT. This patch introduces a cond_resched() call to make the aforementioned read_swap_cache_async() busy loop condition to bail out when necessary, thus avoiding the subtle race window. Signed-off-by: Rafael Aquini Acked-by: Johannes Weiner Acked-by: KOSAKI Motohiro Acked-by: Hugh Dickins Cc: Shaohua Li Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 508056cc78435675c3f77dbb8de03f5b5ebe7d17 Author: Daniel Vetter Date: Mon Jun 10 09:47:58 2013 +0200 drm/i915: prefer VBT modes for SVDO-LVDS over EDID commit c3456fb3e4712d0448592af3c5d644c9472cd3c1 upstream. In commit 53d3b4d7778daf15900867336c85d3f8dd70600c Author: Egbert Eich Date: Tue Jun 4 17:13:21 2013 +0200 drm/i915/sdvo: Use &intel_sdvo->ddc instead of intel_sdvo->i2c for DDC Egbert Eich fixed a long-standing bug where we simply used a non-working i2c controller to read the EDID for SDVO-LVDS panels. Unfortunately some machines seem to not be able to cope with the mode provided in the EDID. Specifically they seem to not be able to cope with a 4x pixel mutliplier instead of a 2x one, which seems to have been worked around by slightly changing the panels native mode in the VBT so that the dotclock is just barely above 50MHz. Since it took forever to notice the breakage it's fairly safe to assume that at least for SDVO-LVDS panels the VBT contains fairly sane data. So just switch around the order and use VBT modes first. v2: Also add EDID modes just in case, and spell Egbert correctly. v3: Elaborate a bit more about what's going on on Chris' machine. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=65524 Reported-and-tested-by: Chris Wilson Cc: Egbert Eich Signed-off-by: Daniel Vetter Signed-off-by: Greg Kroah-Hartman commit 7aaa3421174ff9197d2f9e5f937b0faed738c951 Author: Stephen M. Cameron Date: Wed Jun 12 14:04:47 2013 -0700 cciss: fix broken mutex usage in ioctl commit 03f47e888daf56c8e9046c674719a0bcc644eed5 upstream. If a new logical drive is added and the CCISS_REGNEWD ioctl is invoked (as is normal with the Array Configuration Utility) the process will hang as below. It attempts to acquire the same mutex twice, once in do_ioctl() and once in cciss_unlocked_open(). The BKL was recursive, the mutex isn't. Linux version 3.10.0-rc2 (scameron@localhost.localdomain) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC) ) #1 SMP Fri May 24 14:32:12 CDT 2013 [...] acu D 0000000000000001 0 3246 3191 0x00000080 Call Trace: schedule+0x29/0x70 schedule_preempt_disabled+0xe/0x10 __mutex_lock_slowpath+0x17b/0x220 mutex_lock+0x2b/0x50 cciss_unlocked_open+0x2f/0x110 [cciss] __blkdev_get+0xd3/0x470 blkdev_get+0x5c/0x1e0 register_disk+0x182/0x1a0 add_disk+0x17c/0x310 cciss_add_disk+0x13a/0x170 [cciss] cciss_update_drive_info+0x39b/0x480 [cciss] rebuild_lun_table+0x258/0x370 [cciss] cciss_ioctl+0x34f/0x470 [cciss] do_ioctl+0x49/0x70 [cciss] __blkdev_driver_ioctl+0x28/0x30 blkdev_ioctl+0x200/0x7b0 block_ioctl+0x3c/0x40 do_vfs_ioctl+0x89/0x350 SyS_ioctl+0xa1/0xb0 system_call_fastpath+0x16/0x1b This mutex usage was added into the ioctl path when the big kernel lock was removed. As it turns out, these paths are all thread safe anyway (or can easily be made so) and we don't want ioctl() to be single threaded in any case. Signed-off-by: Stephen M. Cameron Cc: Jens Axboe Cc: Mike Miller Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit fc1cbc74d5de169112578e4479d48408b222e325 Author: Robin Holt Date: Wed Jun 12 14:04:37 2013 -0700 reboot: rigrate shutdown/reboot to boot cpu commit cf7df378aa4ff7da3a44769b7ff6e9eef1a9f3db upstream. We recently noticed that reboot of a 1024 cpu machine takes approx 16 minutes of just stopping the cpus. The slowdown was tracked to commit f96972f2dc63 ("kernel/sys.c: call disable_nonboot_cpus() in kernel_restart()"). The current implementation does all the work of hot removing the cpus before halting the system. We are switching to just migrating to the boot cpu and then continuing with shutdown/reboot. This also has the effect of not breaking x86's command line parameter for specifying the reboot cpu. Note, this code was shamelessly copied from arch/x86/kernel/reboot.c with bits removed pertaining to the reboot_cpu command line parameter. Signed-off-by: Robin Holt Tested-by: Shawn Guo Cc: "Srivatsa S. Bhat" Cc: H. Peter Anvin Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Russ Anderson Cc: Robin Holt Cc: Russell King Cc: Guan Xuetao Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit b3cba474228862814480d40554f77e98483f41ed Author: Srivatsa S. Bhat Date: Wed Jun 12 14:04:36 2013 -0700 CPU hotplug: provide a generic helper to disable/enable CPU hotplug commit 16e53dbf10a2d7e228709a7286310e629ede5e45 upstream. There are instances in the kernel where we would like to disable CPU hotplug (from sysfs) during some important operation. Today the freezer code depends on this and the code to do it was kinda tailor-made for that. Restructure the code and make it generic enough to be useful for other usecases too. Signed-off-by: Srivatsa S. Bhat Signed-off-by: Robin Holt Cc: H. Peter Anvin Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Russ Anderson Cc: Robin Holt Cc: Russell King Cc: Guan Xuetao Cc: Shawn Guo Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 91c930674642f4b849d64398db261ad13c3ab354 Author: Sujith Manoharan Date: Thu Jun 6 10:06:29 2013 +0530 ath9k: Use minstrel rate control by default commit 5efac94999ff218e0101f67a059e44abb4b0b523 upstream. The ath9k rate control algorithm has various architectural issues that make it a poor fit in scenarios like congested environments etc. An example: https://bugzilla.redhat.com/show_bug.cgi?id=927191 Change the default to minstrel which is more robust in such cases. The ath9k RC code is left in the driver for now, maybe it can be removed altogether later on. Signed-off-by: Sujith Manoharan Cc: Jouni Malinen Cc: Linus Torvalds Signed-off-by: John W. Linville Signed-off-by: Greg Kroah-Hartman commit 75b780e57baec8085beba0bf0e8ed9df68a24fb7 Author: Sujith Manoharan Date: Sat Jun 1 07:08:09 2013 +0530 ath9k: Disable PowerSave by default commit 531671cb17af07281e6f28c1425f754346e65c41 upstream. Almost all the DMA issues which have plagued ath9k (in station mode) for years are related to PS. Disabling PS usually "fixes" the user's connection stablility. Reports of DMA problems are still trickling in and are sitting in the kernel bugzilla. Until the PS code in ath9k is given a thorough review, disbale it by default. The slight increase in chip power consumption is a small price to pay for improved link stability. Signed-off-by: Sujith Manoharan Signed-off-by: John W. Linville Signed-off-by: Greg Kroah-Hartman commit 2147439889d33148785d20cdff01491f5617965f Author: Johan Hedberg Date: Wed May 29 09:51:29 2013 +0300 Bluetooth: Fix mgmt handling of power on failures commit 96570ffcca0b872dc8626e97569d2697f374d868 upstream. If hci_dev_open fails we need to ensure that the corresponding mgmt_set_powered command gets an appropriate response. This patch fixes the missing response by adding a new mgmt_set_powered_failed function that's used to indicate a power on failure to mgmt. Since a situation with the device being rfkilled may require special handling in user space the patch uses a new dedicated mgmt status code for this. Signed-off-by: Johan Hedberg Acked-by: Marcel Holtmann Signed-off-by: Gustavo Padovan Signed-off-by: John W. Linville Signed-off-by: Greg Kroah-Hartman commit b507d034be0418dd2dee4fe3d40f389b1818b37a Author: Patrik Jakobsson Date: Sat Jun 8 20:23:08 2013 +0200 drm/gma500/cdv: Unpin framebuffer on crtc disable commit 22e7c385a80d771aaf3a15ae7ccea3b0686bbe10 upstream. The framebuffer needs to be unpinned in the crtc->disable callback because of previous pinning in psb_intel_pipe_set_base(). This will fix a memory leak where the framebuffer was released but not unpinned properly. This patch only affects Cedarview. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=889511 Bugzilla: https://bugzilla.novell.com/show_bug.cgi?id=812113 Reviewed-by: Daniel Vetter Signed-off-by: Patrik Jakobsson Signed-off-by: Greg Kroah-Hartman commit 11882ca48a89c8208a5550b865efd3f4ab9fe3a4 Author: Patrik Jakobsson Date: Wed Jun 5 14:24:01 2013 +0200 drm/gma500/psb: Unpin framebuffer on crtc disable commit 820de86a90089ee607d7864538c98a23b503c846 upstream. The framebuffer needs to be unpinned in the crtc->disable callback because of previous pinning in psb_intel_pipe_set_base(). This will fix a memory leak where the framebuffer was released but not unpinned properly. This patch only affects Poulsbo. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=889511 Bugzilla: https://bugzilla.novell.com/show_bug.cgi?id=812113 Reviewed-by: Daniel Vetter Signed-off-by: Patrik Jakobsson Signed-off-by: Greg Kroah-Hartman commit e053ee3fe9c40827f2a3fce73b36f2f892a43292 Author: Tony Lindgren Date: Wed Jun 12 14:04:48 2013 -0700 drivers/rtc/rtc-twl.c: fix missing device_init_wakeup() when booted with device tree commit 24b8256a1fb28d357bc6fa09184ba29b4255ba5c upstream. When booted in legacy mode device_init_wakeup() gets called by drivers/mfd/twl-core.c when the children are initialized. However, when booted using device tree, the children are created with of_platform_populate() instead add_children(). This means that the RTC driver will not have device_init_wakeup() set, and we need to call it from the driver probe like RTC drivers typically do. Without this we cannot test PM wake-up events on omaps for cases where there may not be any physical wake-up event. Signed-off-by: Tony Lindgren Reported-by: Kevin Hilman Cc: Alessandro Zummo Cc: Jingoo Han Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit c7ed0848c61d75c58f1c5b13c308f7fcaf94ed89 Author: Jim Schutt Date: Wed May 15 13:03:35 2013 -0500 ceph: ceph_pagelist_append might sleep while atomic commit 39be95e9c8c0b5668c9f8806ffe29bf9f4bc0f40 upstream. Ceph's encode_caps_cb() worked hard to not call __page_cache_alloc() while holding a lock, but it's spoiled because ceph_pagelist_addpage() always calls kmap(), which might sleep. Here's the result: [13439.295457] ceph: mds0 reconnect start [13439.300572] BUG: sleeping function called from invalid context at include/linux/highmem.h:58 [13439.309243] in_atomic(): 1, irqs_disabled(): 0, pid: 12059, name: kworker/1:1 . . . [13439.376225] Call Trace: [13439.378757] [] __might_sleep+0xfc/0x110 [13439.384353] [] ceph_pagelist_append+0x120/0x1b0 [libceph] [13439.391491] [] ceph_encode_locks+0x89/0x190 [ceph] [13439.398035] [] ? _raw_spin_lock+0x49/0x50 [13439.403775] [] ? lock_flocks+0x15/0x20 [13439.409277] [] encode_caps_cb+0x41f/0x4a0 [ceph] [13439.415622] [] ? igrab+0x28/0x70 [13439.420610] [] ? iterate_session_caps+0xe8/0x250 [ceph] [13439.427584] [] iterate_session_caps+0x115/0x250 [ceph] [13439.434499] [] ? set_request_path_attr+0x2d0/0x2d0 [ceph] [13439.441646] [] send_mds_reconnect+0x238/0x450 [ceph] [13439.448363] [] ? ceph_mdsmap_decode+0x5e2/0x770 [ceph] [13439.455250] [] check_new_map+0x352/0x500 [ceph] [13439.461534] [] ceph_mdsc_handle_map+0x1bd/0x260 [ceph] [13439.468432] [] ? mutex_unlock+0xe/0x10 [13439.473934] [] extra_mon_dispatch+0x22/0x30 [ceph] [13439.480464] [] dispatch+0xbc/0x110 [libceph] [13439.486492] [] process_message+0x1ad/0x1d0 [libceph] [13439.493190] [] ? read_partial_message+0x3e8/0x520 [libceph] . . . [13439.587132] ceph: mds0 reconnect success [13490.720032] ceph: mds0 caps stale [13501.235257] ceph: mds0 recovery completed [13501.300419] ceph: mds0 caps renewed Fix it up by encoding locks into a buffer first, and when the number of encoded locks is stable, copy that into a ceph_pagelist. [elder@inktank.com: abbreviated the stack info a bit.] Signed-off-by: Jim Schutt Reviewed-by: Alex Elder Signed-off-by: Greg Kroah-Hartman commit a169043d55452c50a80673f47ab30763cdba407a Author: Jim Schutt Date: Wed May 15 13:03:35 2013 -0500 ceph: add cpu_to_le32() calls when encoding a reconnect capability commit c420276a532a10ef59849adc2681f45306166b89 upstream. In his review, Alex Elder mentioned that he hadn't checked that num_fcntl_locks and num_flock_locks were properly decoded on the server side, from a le32 over-the-wire type to a cpu type. I checked, and AFAICS it is done; those interested can consult Locker::_do_cap_update() in src/mds/Locker.cc and src/include/encoding.h in the Ceph server code (git://github.com/ceph/ceph). I also checked the server side for flock_len decoding, and I believe that also happens correctly, by virtue of having been declared __le32 in struct ceph_mds_cap_reconnect, in src/include/ceph_fs.h. Signed-off-by: Jim Schutt Reviewed-by: Alex Elder Signed-off-by: Greg Kroah-Hartman commit b84a4fc13ba928644833672442d29f6c7dff2622 Author: Alex Elder Date: Wed May 15 16:28:33 2013 -0500 libceph: must hold mutex for reset_changed_osds() commit 14d2f38df67fadee34625fcbd282ee22514c4846 upstream. An osd client has a red-black tree describing its osds, and occasionally we would get crashes due to one of these trees tree becoming corrupt somehow. The problem turned out to be that reset_changed_osds() was being called without protection of the osd client request mutex. That function would call __reset_osd() for any osd that had changed, and __reset_osd() would call __remove_osd() for any osd with no outstanding requests, and finally __remove_osd() would remove the corresponding entry from the red-black tree. Thus, the tree was getting modified without having any lock protection, and was vulnerable to problems due to concurrent updates. This appears to be the only osd tree updating path that has this problem. It can be fairly easily fixed by moving the call up a few lines, to just before the request mutex gets dropped in kick_requests(). This resolves: http://tracker.ceph.com/issues/5043 Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit d2305b5a32d15d0156115c24adcd8ecfa30209c7 Author: Kees Cook Date: Fri May 10 14:48:21 2013 -0700 b43: stop format string leaking into error msgs commit e0e29b683d6784ef59bbc914eac85a04b650e63c upstream. The module parameter "fwpostfix" is userspace controllable, unfiltered, and is used to define the firmware filename. b43_do_request_fw() populates ctx->errors[] on error, containing the firmware filename. b43err() parses its arguments as a format string. For systems with b43 hardware, this could lead to a uid-0 to ring-0 escalation. CVE-2013-2852 Signed-off-by: Kees Cook Signed-off-by: John W. Linville Signed-off-by: Greg Kroah-Hartman