Linux Kernel 4.4.x (Ubuntu 16.04) – ‘double-fdput()’ bpf(BPF_PROG_LOAD) Privilege Escalation

  • 作者: Google Security Research
    日期: 2016-05-04
  • 类别:
    平台:
  • 来源:https://www.exploit-db.com/exploits/39772/
  • Source: https://bugs.chromium.org/p/project-zero/issues/detail?id=808
    
    In Linux >=4.4, when the CONFIG_BPF_SYSCALL config option is set and the
    kernel.unprivileged_bpf_disabled sysctl is not explicitly set to 1 at runtime,
    unprivileged code can use the bpf() syscall to load eBPF socket filter programs.
    These conditions are fulfilled in Ubuntu 16.04.
    
    When an eBPF program is loaded using bpf(BPF_PROG_LOAD, ...), the first
    function that touches the supplied eBPF instructions is
    replace_map_fd_with_map_ptr(), which looks for instructions that reference eBPF
    map file descriptors and looks up pointers for the corresponding map files.
    This is done as follows:
    
    	/* look for pseudo eBPF instructions that access map FDs and
    	 * replace them with actual map pointers
    	 */
    	static int replace_map_fd_with_map_ptr(struct verifier_env *env)
    	{
    		struct bpf_insn *insn = env->prog->insnsi;
    		int insn_cnt = env->prog->len;
    		int i, j;
    
    		for (i = 0; i < insn_cnt; i++, insn++) {
    			[checks for bad instructions]
    
    			if (insn[0].code == (BPF_LD | BPF_IMM | BPF_DW)) {
    				struct bpf_map *map;
    				struct fd f;
    
    				[checks for bad instructions]
    
    				f = fdget(insn->imm);
    				map = __bpf_map_get(f);
    				if (IS_ERR(map)) {
    					verbose("fd %d is not pointing to valid bpf_map\n",
    						insn->imm);
    					fdput(f);
    					return PTR_ERR(map);
    				}
    
    				[...]
    			}
    		}
    		[...]
    	}
    
    
    __bpf_map_get contains the following code:
    
    /* if error is returned, fd is released.
     * On success caller should complete fd access with matching fdput()
     */
    struct bpf_map *__bpf_map_get(struct fd f)
    {
    	if (!f.file)
    		return ERR_PTR(-EBADF);
    	if (f.file->f_op != &bpf_map_fops) {
    		fdput(f);
    		return ERR_PTR(-EINVAL);
    	}
    
    	return f.file->private_data;
    }
    
    The problem is that when the caller supplies a file descriptor number referring
    to a struct file that is not an eBPF map, both __bpf_map_get() and
    replace_map_fd_with_map_ptr() will call fdput() on the struct fd. If
    __fget_light() detected that the file descriptor table is shared with another
    task and therefore the FDPUT_FPUT flag is set in the struct fd, this will cause
    the reference count of the struct file to be over-decremented, allowing an
    attacker to create a use-after-free situation where a struct file is freed
    although there are still references to it.
    
    A simple proof of concept that causes oopses/crashes on a kernel compiled with
    memory debugging options is attached as crasher.tar.
    
    
    One way to exploit this issue is to create a writable file descriptor, start a
    write operation on it, wait for the kernel to verify the file's writability,
    then free the writable file and open a readonly file that is allocated in the
    same place before the kernel writes into the freed file, allowing an attacker
    to write data to a readonly file. By e.g. writing to /etc/crontab, root
    privileges can then be obtained.
    
    There are two problems with this approach:
    
    The attacker should ideally be able to determine whether a newly allocated
    struct file is located at the same address as the previously freed one. Linux
    provides a syscall that performs exactly this comparison for the caller:
    kcmp(getpid(), getpid(), KCMP_FILE, uaf_fd, new_fd).
    
    In order to make exploitation more reliable, the attacker should be able to
    pause code execution in the kernel between the writability check of the target
    file and the actual write operation. This can be done by abusing the writev()
    syscall and FUSE: The attacker mounts a FUSE filesystem that artificially delays
    read accesses, then mmap()s a file containing a struct iovec from that FUSE
    filesystem and passes the result of mmap() to writev(). (Another way to do this
    would be to use the userfaultfd() syscall.)
    
    writev() calls do_writev(), which looks up the struct file * corresponding to
    the file descriptor number and then calls vfs_writev(). vfs_writev() verifies
    that the target file is writable, then calls do_readv_writev(), which first
    copies the struct iovec from userspace using import_iovec(), then performs the
    rest of the write operation. Because import_iovec() performs a userspace memory
    access, it may have to wait for pages to be faulted in - and in this case, it
    has to wait for the attacker-owned FUSE filesystem to resolve the pagefault,
    allowing the attacker to suspend code execution in the kernel at that point
    arbitrarily.
    
    An exploit that puts all this together is in exploit.tar. Usage:
    
    user@host:~/ebpf_mapfd_doubleput$ ./compile.sh
    user@host:~/ebpf_mapfd_doubleput$ ./doubleput
    starting writev
    woohoo, got pointer reuse
    writev returned successfully. if this worked, you'll have a root shell in <=60 seconds.
    suid file detected, launching rootshell...
    we have root privs now...
    root@host:~/ebpf_mapfd_doubleput# id
    uid=0(root) gid=0(root) groups=0(root),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),113(lpadmin),128(sambashare),999(vboxsf),1000(user)
    
    This exploit was tested on a Ubuntu 16.04 Desktop system.
    
    Fix: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=8358b02bf67d3a5d8a825070e1aa73f25fb2e4c7
    
    
    Proof of Concept: https://bugs.chromium.org/p/project-zero/issues/attachment?aid=232552
    Exploit-DB Mirror: https://gitlab.com/exploit-database/exploitdb-bin-sploits/-/raw/main/bin-sploits/39772.zip