XNU – Remote Double-Free via Data Race in IPComp Input Path

  • 作者: Google Security Research
    日期: 2019-10-09
  • 类别:
    平台:
  • 来源:https://www.exploit-db.com/exploits/47479/
  • === Summary ===
    This report describes a bug in the XNU implementation of the IPComp protocol
    (https://tools.ietf.org/html/rfc3173). This bug can be remotely triggered by an
    attacker who is able to send traffic to a macOS system (iOS AFAIK isn't
    affected) *over two network interfaces at the same time*.
    
    
    === Some basics to provide context ===
    IPComp is a protocol for compressing the payload of IP packets.
    
    The XNU implementation of IPComp is (going by the last public XNU release)
    enabled only on X86-64; ARM64 doesn't seem to have the feature enabled at all
    (look for ipcomp_zlib in config/MASTER.x86_64 and config/MASTER.arm64). In other
    words, it's enabled on macOS and disabled on iOS.
    
    While IPComp is related to IPsec, the IPComp input path processes input even
    when the user has not configured any IPsec stuff on the system.
    
    zlib requires fairly large buffers for decompression and especially for
    compression. In order to avoid allocating such buffers for each packet, IPComp
    uses two global z_stream instances "deflate_stream" and "inflate_stream".
    If IPComp isn't used, the buffer pointers in these z_stream instances remain
    NULL; only when IPComp is actually used, the kernel will attempt to initialize
    the buffer pointers.
    
    As far as I can tell, the IPComp implementation of XNU has been completely
    broken for years, which makes it impossible to actually reach the decompression
    code. ipcomp_algorithm_lookup() is responsible for allocating global buffers for
    the compression and decompression code; however, all of these allocations go
    through deflate_alloc(), which (since xnu-1228, which corresponds to macOS 10.5
    from 2007) calls _MALLOC() with M_NOWAIT. _MALLOC() leads to kalloc_canblock(),
    which, if the M_NOWAIT flag was set and the allocation is too big for a kalloc
    zone (size >= kalloc_max_prerounded), immediately returns NULL. On X86-64,
    kalloc_max_prerounded is 0x2001; both deflateInit2() and inflateInit2() attempt
    allocations bigger than that, causing them to fail with Z_MEM_ERROR, as is
    visible with dtrace when observing the system's reaction to a single incoming
    IPComp packet [empty lines removed]:
    
    ```
    bash-3.2# ./inflate_test.dtrace
    dtrace: script './inflate_test.dtrace' matched 11 probes
    CPU IDFUNCTION:NAME
    0 243037deflateInit2_:entry deflate init (thread=ffffff802db84a40)
    0 224285kalloc_canblock:entry kalloc_canblock(size=0x1738, canblock=0, site=ffffff8018e787e8)
    0 224286 kalloc_canblock:return kalloc_canblock()=0xffffff80496b9800
    0 224285kalloc_canblock:entry kalloc_canblock(size=0x2000, canblock=0, site=ffffff8018e787e8)
    0 224286 kalloc_canblock:return kalloc_canblock()=0xffffff802f42f000
    0 224285kalloc_canblock:entry kalloc_canblock(size=0x2000, canblock=0, site=ffffff8018e787e8)
    0 224286 kalloc_canblock:return kalloc_canblock()=0x0
    0 224285kalloc_canblock:entry kalloc_canblock(size=0x20000, canblock=0, site=ffffff8018e787e8)
    0 224286 kalloc_canblock:return kalloc_canblock()=0x0
    0 224285kalloc_canblock:entry kalloc_canblock(size=0x20000, canblock=0, site=ffffff8018e787e8)
    0 224286 kalloc_canblock:return kalloc_canblock()=0x0
    0 243038 deflateInit2_:return rval=0xfffffffc
    0 243073inflateInit2_:entry inflate init (thread=ffffff802db84a40)
    0 224285kalloc_canblock:entry kalloc_canblock(size=0x2550, canblock=0, site=ffffff8018e787e8)
    0 224286 kalloc_canblock:return kalloc_canblock()=0x0
    0 243074 inflateInit2_:return rval=0xfffffffc
    ```
    
    (On iOS, the kalloc() limit seems to be higher, so if IPComp was built there,
    the input path might actually work?)
    
    
    === main bug description ===
    IPComp uses a single global `static z_stream inflate_stream` for decompressing
    all incoming packets. This global is used without any locking. While processing
    of packets from a single interface seems to be single-threaded, packets arriving
    on multiple ethernet interfaces at the same time (or on an ethernet interface
    and a non-ethernet interface) can be processed in parallel (see
    dlil_create_input_thread() and its caller for the precise threading rules).
    Since zlib isn't designed for concurrent use of a z_stream, this leads to memory
    corruption.
    
    If IPComp actually worked, I believe that this bug would lead to
    things like out-of-bounds reads, out-of-bounds writes and use-after-frees.
    However, since IPComp never actually manages to set up the compression and
    decompression state, the bug instead manifests in the code that, for every
    incoming IPComp packet, attempts to set up the deflate buffers and tears down
    the successfully allocated buffers because some of the allocations failed:
    
    ```
    int ZEXPORT
    deflateInit2_(z_streamp strm, intlevel, intmethod, intwindowBits,
    intmemLevel, intstrategy, const char *version,
    int stream_size)
    {
    [...]
    if (memLevel < 1 || memLevel > MAX_MEM_LEVEL || method != Z_DEFLATED ||
    windowBits < 8 || windowBits > 15 || level < 0 || level > 9 ||
    strategy < 0 || strategy > Z_FIXED) {
    return Z_STREAM_ERROR;
    }
    if (windowBits == 8) windowBits = 9;/* until 256-byte window bug fixed */
    s = (deflate_state *) ZALLOC(strm, 1, sizeof(deflate_state));
    if (s == Z_NULL) return Z_MEM_ERROR;
    strm->state = (struct internal_state FAR *)s;
    [...]
    s->window = (Bytef *) ZALLOC(strm, s->w_size, 2*sizeof(Byte));
    s->prev = (Posf *)ZALLOC(strm, s->w_size, sizeof(Pos));
    s->head = (Posf *)ZALLOC(strm, s->hash_size, sizeof(Pos));
    
    s->lit_bufsize = 1 << (memLevel + 6); /* 16K elements by default */
    
    overlay = (ushf *) ZALLOC(strm, s->lit_bufsize, sizeof(ush)+2);
    s->pending_buf = (uchf *) overlay;
    [...]
    if (s->window == Z_NULL || s->prev == Z_NULL || s->head == Z_NULL ||
    s->pending_buf == Z_NULL) {
    [...]
    deflateEnd (strm);
    return Z_MEM_ERROR;
    }
    [...]
    }
    [...]
    int ZEXPORT
    deflateEnd(z_streamp strm)
    {
    [...]
    /* Deallocate in reverse order of allocations: */
    TRY_FREE(strm, strm->state->pending_buf);
    TRY_FREE(strm, strm->state->head);
    TRY_FREE(strm, strm->state->prev);
    TRY_FREE(strm, strm->state->window);
    
    ZFREE(strm, strm->state);
    strm->state = Z_NULL;
    
    return status == BUSY_STATE ? Z_DATA_ERROR : Z_OK;
    }
    ```
    
    When multiple executions of this code race, it is possible for two threads to
    free the same buffer, causing a double-free:
    
    ```
    *** Panic Report ***
    panic(cpu 2 caller 0xffffff8012802df5): "zfree: double free of 0xffffff80285d9000 to zone kalloc.8192\n"@/BuildRoot/Library/Caches/com.apple.xbs/Sources/xnu/xnu-4903.261.4/osfmk/kern/zalloc.c:1304
    Backtrace (CPU 2), Frame : Return Address
    0xffffff912141b420 : 0xffffff80127aea2d mach_kernel : _handle_debugger_trap + 0x47d
    0xffffff912141b470 : 0xffffff80128e9e95 mach_kernel : _kdp_i386_trap + 0x155
    0xffffff912141b4b0 : 0xffffff80128db70a mach_kernel : _kernel_trap + 0x50a
    0xffffff912141b520 : 0xffffff801275bb40 mach_kernel : _return_from_trap + 0xe0
    0xffffff912141b540 : 0xffffff80127ae447 mach_kernel : _panic_trap_to_debugger + 0x197
    0xffffff912141b660 : 0xffffff80127ae293 mach_kernel : _panic + 0x63
    0xffffff912141b6d0 : 0xffffff8012802df5 mach_kernel : _zcram + 0xa15
    0xffffff912141b710 : 0xffffff8012804d4a mach_kernel : _zfree + 0x67a
    0xffffff912141b7f0 : 0xffffff80127bac58 mach_kernel : _kfree_addr + 0x68
    0xffffff912141b850 : 0xffffff8012dfc837 mach_kernel : _deflateEnd + 0x87
    0xffffff912141b870 : 0xffffff8012dfc793 mach_kernel : _deflateInit2_ + 0x253
    0xffffff912141b8c0 : 0xffffff8012c164a3 mach_kernel : _ipcomp_algorithm_lookup + 0x63
    0xffffff912141b8f0 : 0xffffff8012c16fb2 mach_kernel : _ipcomp4_input + 0x112
    0xffffff912141b990 : 0xffffff8012b89907 mach_kernel : _ip_proto_dispatch_in_wrapper + 0x1a7
    0xffffff912141b9e0 : 0xffffff8012b8bfa6 mach_kernel : _ip_input + 0x18b6
    0xffffff912141ba40 : 0xffffff8012b8a5a9 mach_kernel : _ip_input_process_list + 0xc69
    0xffffff912141bcb0 : 0xffffff8012aac3ed mach_kernel : _proto_input + 0x9d
    0xffffff912141bce0 : 0xffffff8012a76c41 mach_kernel : _ether_attach_inet + 0x471
    0xffffff912141bd70 : 0xffffff8012a6b036 mach_kernel : _dlil_rxpoll_set_params + 0x1b36
    0xffffff912141bda0 : 0xffffff8012a6aedc mach_kernel : _dlil_rxpoll_set_params + 0x19dc
    0xffffff912141bf10 : 0xffffff8012a692e9 mach_kernel : _ifp_if_ioctl + 0x10d9
    0xffffff912141bfa0 : 0xffffff801275b0ce mach_kernel : _call_continuation + 0x2e
    
    BSD process name corresponding to current thread: kernel_task
    Boot args: -zp -v keepsyms=1
    
    Mac OS version:
    18F132
    
    Kernel version:
    Darwin Kernel Version 18.6.0: Thu Apr 25 23:16:27 PDT 2019; root:xnu-4903.261.4~2/RELEASE_X86_64
    Kernel UUID: 7C8BB636-E593-3CE4-8528-9BD24A688851
    Kernel slide: 0x0000000012400000
    Kernel text base: 0xffffff8012600000
    __HIBtext base: 0xffffff8012500000
    System model name: Macmini7,1 (Mac-XXXXXXXXXXXXXXXX)
    ```
    
    
    === Repro steps ===
    You'll need a Mac (I used a Mac mini) and a Linux workstation.
    Stick two USB ethernet adapters into the Mac.
    Make sure that your Linux workstation has two free ethernet ports; if it
    doesn't, also stick USB ethernet adapters into your workstation.
    Take two ethernet cables; for both of them, stick one end into the Linux
    workstation and the other end into the Mac.
    Set up static IP addresses for both interfaces on the Linux box and the Mac. I'm
    using:
    
     - Linux, first connection: 192.168.250.1/24
     - Mac, first connection: 192.168.250.2/24
     - Linux, second connection: 192.168.251.1/24
     - Mac, second connection: 192.168.251.2/24
    
    On the Linux workstation, ping both IP addresses of the Mac, then dump the
    relevant ARP table entries:
    
    ```
    $ ping -c1 192.168.250.2
    PING 192.168.250.2 (192.168.250.2) 56(84) bytes of data.
    64 bytes from 192.168.250.2: icmp_seq=1 ttl=64 time=0.794 ms
    [...]
    $ ping -c1 192.168.251.2
    PING 192.168.251.2 (192.168.251.2) 56(84) bytes of data.
    64 bytes from 192.168.251.2: icmp_seq=1 ttl=64 time=0.762 ms
    [...]
    $ arp -n | egrep '192\.168\.25[01]'
    192.168.250.2ether aa:aa:aa:aa:aa:aa C eth0
    192.168.251.2ether bb:bb:bb:bb:bb:bb C eth1
    $ 
    ```
    
    On the Linux workstation, build the attached ipcomp_uaf.c and run it:
    
    ```
    $ gcc -o ipcomp_recursion ipcomp_recursion.c -Wall
    $ sudo bash
    # ./ipcomp_uaf
    usage: ./ipcomp_uaf <if1> <target_mac1> <src_ip1> <dst_ip1> <if2> <target_mac2> <src_ip2> <dst_ip2>
    # ./ipcomp_uaf eth0 aa:aa:aa:aa:aa:aa 192.168.250.1 192.168.250.2 eth1 bb:bb:bb:bb:bb:bb 192.168.251.1 192.168.251.2
    ```
    
    After something like a second, you should be able to observe that the Mac panics.
    I have observed panics via double-free and via null deref triggered by the PoC.
    (Stop the PoC afterwards, otherwise it'll panic again as soon as the network
    interfaces are up.)
    
    (The PoC also works if you use broadcast addresses as follows: ```
    # ./ipcomp_uaf eth0 ff:ff:ff:ff:ff:ff 0.0.0.0 255.255.255.255 eth1 ff:ff:ff:ff:ff:ff 0.0.0.0 255.255.255.255
    ```)
    
    
    === Fixing the bug ===
    I believe that by far the best way to fix this issue is to rip out the entire
    feature. Unless I'm missing some way for the initialization to succeed, it looks
    like nobody can have successfully used this feature in the last few years; and
    apparently nobody felt strongly enough about that to get the feature fixed.
    At the same time, this thing is remote attack surface in the IP stack, and it
    looks like it has already led to a remote DoS bug in the past - the first search
    result on bing.com for both "ipcomp macos" and "ipcomp xnu" is
    <https://www.exploit-db.com/exploits/5191>.
    
    In case you decide to fix the bug in a different way, please note:
    
     - I believe that this can *NOT* be fixed by removing the PR_PROTOLOCK flag from
     the entries in `inetsw` and `inet6sw`. While removal of that flag would cause
     the input code to take the domain mutex before invoking the protocol handler,
     IPv4 and IPv6 are different domains, and so concurrent processing of
     IPv4+IPComp and IPv6+IPComp packets would probably still trigger the bug.
     - If you decide to fix the memory allocation of IPComp so that the input path
     works again (please don't - you'll never again have such a great way to prove
     that nobody is using that code), I think another bug will become reachable:
     I don't see anything that prevents unbounded recursion between
     ip_proto_dispatch_in() and ipcomp4_input() using an IP packet with a series
     of IPComp headers, which would be usable to cause a kernel panic via stack
     overflow with a single IP packet.
     In case you want to play with that, I wrote a PoC that generates packets with
     100 such headers and attached it as ipcomp_recursion.c.
     (The other IPv6 handlers for pseudo-protocols like IPPROTO_FRAGMENT seem to
     avoid this problem by having the )
    
    
    Proof of Concept:
    https://gitlab.com/exploit-database/exploitdb-bin-sploits/-/raw/main/bin-sploits/47479.zip