Xen allows pagetables of the same level to map each other as readonly
in PV domains. This is useful if a guest wants to use the
self-referential pagetable trick for easy access to pagetables
by mapped virtual address.
When cleaning up a pagetable after the last typed reference to it has been
dropped (via __put_page_type() -> __put_final_page_type() -> free_page_type()),
Xen will recursively drop the typed refcounts of pages referenced by the pagetable,
potentially recursively cleaning them up as well.
For normal pagetables, the recursion depth is bounded by the number of paging levels
the architecture supports. However, no such depth limit exists for pagetables of the
same depth that map each other.
The attached PoC will set up a chain of 1000 L4 pagetables such that
the first pagetable is type-pinned and each following pagetable is referenced by the
previous one. Then, the type-pin of the first pagetable is removed, and the following
999 pagetables are recursively cleaned up, causing a stack overflow.
To run the PoC in a PV domain, install kernel headers, then run ./compile, then load the built module via insmod.
Xen console output caused by running the PoC inside a normal PV domain:
==============================
(XEN) Xen version 4.8.1 (Debian 4.8.1-1+deb9u3) (ian.jackson@eu.citrix.com) (gcc (Debian 6.3.0-18) 6.3.0 20170516) debug=nThu Sep7 18:24:26 UTC 2017
(XEN) Bootloader: GRUB 2.02~beta3-5
(XEN) Command line: loglvl=all com1=115200,8n1,pci console=com1 placeholder
(XEN) Video information:
(XEN)VGA is text mode 80x25, font 8x16
(XEN) Disc information:
(XEN)Found 1 MBR signatures
(XEN)Found 1 EDD information structures
(XEN) Xen-e820 RAM map:
(XEN)0000000000000000 - 000000000009fc00 (usable)
(XEN)000000000009fc00 - 00000000000a0000 (reserved)
(XEN)00000000000f0000 - 0000000000100000 (reserved)
(XEN)0000000000100000 - 00000000dfff0000 (usable)
(XEN)00000000dfff0000 - 00000000e0000000 (ACPI data)
(XEN)00000000fec00000 - 00000000fec01000 (reserved)
(XEN)00000000fee00000 - 00000000fee01000 (reserved)
(XEN)00000000fffc0000 - 0000000100000000 (reserved)
(XEN)0000000100000000 - 0000000120000000 (usable)
(XEN) ACPI: RSDP 000E0000, 0024 (r2 VBOX)
(XEN) ACPI: XSDT DFFF0030, 003C (r1 VBOX VBOXXSDT1 ASL61)
(XEN) ACPI: FACP DFFF00F0, 00F4 (r4 VBOX VBOXFACP1 ASL61)
(XEN) ACPI: DSDT DFFF0470, 210F (r1 VBOX VBOXBIOS2 INTL 20140214)
(XEN) ACPI: FACS DFFF0200, 0040
(XEN) ACPI: APIC DFFF0240, 0054 (r2 VBOX VBOXAPIC1 ASL61)
(XEN) ACPI: SSDT DFFF02A0, 01CC (r1 VBOX VBOXCPUT2 INTL 20140214)
(XEN) System RAM: 4095MB (4193852kB)
(XEN) No NUMA configuration found
(XEN) Faking a node at 0000000000000000-0000000120000000
(XEN) Domain heap initialised
(XEN) CPU Vendor: Intel, Family 6 (0x6), Model 78 (0x4e), Stepping 3 (raw 000406e3)
(XEN) found SMP MP-table at 0009fff0
(XEN) DMI 2.5 present.
(XEN) Using APIC driver default
(XEN) ACPI: PM-Timer IO Port: 0x4008 (32 bits)
(XEN) ACPI: SLEEP INFO: pm1x_cnt[1:4004,1:0], pm1x_evt[1:4000,1:0]
(XEN) ACPI: wakeup_vec[dfff020c], vec_size[20]
(XEN) ACPI: Local APIC address 0xfee00000
(XEN) ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
(XEN) ACPI: IOAPIC (id[0x01] address[0xfec00000] gsi_base[0])
(XEN) IOAPIC[0]: apic_id 1, version 32, address 0xfec00000, GSI 0-23
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
(XEN) ACPI: IRQ0 used by override.
(XEN) ACPI: IRQ2 used by override.
(XEN) ACPI: IRQ9 used by override.
(XEN) Enabling APIC mode:Flat.Using 1 I/O APICs
(XEN) ERST table was not found
(XEN) Using ACPI (MADT) for SMP configuration information
(XEN) SMP: Allowing 1 CPUs (0 hotplug CPUs)
(XEN) IRQ limits: 24 GSI, 184 MSI/MSI-X
(XEN) Not enabling x2APIC: depends on iommu_supports_eim.
(XEN) xstate: size: 0x440 and states: 0x7
(XEN) CPU0: No MCE banks present. Machine check support disabled
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) Platform timer is 3.579MHz ACPI PM Timer
(XEN) Detected 2807.850 MHz processor.
(XEN) Initing memory sharing.
(XEN) alt table ffff82d0802bcf38 -> ffff82d0802be594
(XEN) I/O virtualisation disabled
(XEN) nr_sockets: 1
(XEN) ENABLING IO-APIC IRQs
(XEN)-> Using new ACK method
(XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=-1 pin2=-1
(XEN) Allocated console ring of 16 KiB.
(XEN) Brought up 1 CPUs
(XEN) build-id: cd504b2b380e2fe1265376aa845a404b9eb86982
(XEN) CPUIDLE: disabled due to no HPET. Force enable with 'cpuidle'.
(XEN) ACPI sleep modes: S3
(XEN) VPMU: disabled
(XEN) xenoprof: Initialization failed. Intel processor family 6 model 78is not supported
(XEN) Dom0 has maximum 208 PIRQs
(XEN) NX (Execute Disable) protection active
(XEN) *** LOADING DOMAIN 0 ***
(XEN)Xenkernel: 64-bit, lsb, compat32
(XEN)Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x1f5a000
(XEN) PHYSICAL MEMORY ARRANGEMENT:
(XEN)Dom0 alloc.: 0000000118000000->000000011a000000 (989666 pages to be allocated)
(XEN)Init. ramdisk: 000000011ed74000->000000011ffff3b5
(XEN) VIRTUAL MEMORY ARRANGEMENT:
(XEN)Loaded kernel: ffffffff81000000->ffffffff81f5a000
(XEN)Init. ramdisk: 0000000000000000->0000000000000000
(XEN)Phys-Mach map: 0000008000000000->00000080007a6370
(XEN)Start info:ffffffff81f5a000->ffffffff81f5a4b4
(XEN)Page tables: ffffffff81f5b000->ffffffff81f6e000
(XEN)Boot stack:ffffffff81f6e000->ffffffff81f6f000
(XEN)TOTAL: ffffffff80000000->ffffffff82000000
(XEN)ENTRY ADDRESS: ffffffff81d38180
(XEN) Dom0 has maximum 1 VCPUs
(XEN) Scrubbing Free RAM on 1 nodes using 1 CPUs
(XEN) ....................................done.
(XEN) Initial low memory virq threshold set at 0x4000 pages.
(XEN) Std. Loglevel: All
(XEN) Guest Loglevel: Nothing (Rate-limited: Errors and warnings)
(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xen)
(XEN) Freed 312kB init memory
mapping kernel into physical memory
about to get started...
(XEN) d0 attempted to change d0v0's CR4 flags 00000620 -> 00040660
(XEN) PCI add device 0000:00:00.0
(XEN) PCI add device 0000:00:01.0
(XEN) PCI add device 0000:00:01.1
(XEN) PCI add device 0000:00:02.0
(XEN) PCI add device 0000:00:03.0
(XEN) PCI add device 0000:00:04.0
(XEN) PCI add device 0000:00:05.0
(XEN) PCI add device 0000:00:06.0
(XEN) PCI add device 0000:00:07.0
(XEN) PCI add device 0000:00:08.0
(XEN) PCI add device 0000:00:0d.0
Debian GNU/Linux 9 xenhost hvc0
xenhost login: (XEN) d1 attempted to change d1v0's CR4 flags 00000620 -> 00040660
(XEN) d1 attempted to change d1v1's CR4 flags 00000620 -> 00040660
(XEN) *** DOUBLE FAULT ***
(XEN) ----[ Xen-4.8.1x86_64debug=n Not tainted ]----
(XEN) CPU:0
(XEN) RIP:e008:[<ffff82d08017962a>] free_page_type+0xea/0x630
(XEN) RFLAGS: 0000000000010206 CONTEXT: hypervisor
(XEN) rax: 000000000000a3db rbx: ffff82e000147b60 rcx: 0000000000000000
(XEN) rdx: ffff830000000000 rsi: 4000000000000000 rdi: 000000000000a3db
(XEN) rbp: 4400000000000001 rsp: ffff8300dfce5ff8 r8:ffff8300dfce7fff
(XEN) r9:ffff82d0802f2980 r10: 0000000000000000 r11: 0000000000000202
(XEN) r12: 000000000000a3db r13: ffff83011fd74000 r14: ffff83011fd74000
(XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000406a0
(XEN) cr3: 000000000702d000 cr2: ffff8300dfce5fe8
(XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008
(XEN) Valid stack range: ffff8300dfce6000-ffff8300dfce8000, sp=ffff8300dfce5ff8, tss.esp0=ffff8300dfce7fc0
(XEN) Xen stack overflow (dumping trace ffff8300dfce6000-ffff8300dfce8000):
(XEN)[<ffff82d080179cdf>] mm.c#__put_page_type+0x16f/0x290
(XEN)[<ffff82d08017a438>] mm.c#put_page_from_l4e+0x88/0xc0
(XEN)[<ffff82d080179697>] free_page_type+0x157/0x630
(XEN)[<ffff82d080179cdf>] mm.c#__put_page_type+0x16f/0x290
(XEN)[<ffff82d08017a438>] mm.c#put_page_from_l4e+0x88/0xc0
(XEN)[<ffff82d080179697>] free_page_type+0x157/0x630
(XEN)[<ffff82d080179cdf>] mm.c#__put_page_type+0x16f/0x290
(XEN)[<ffff82d08017a438>] mm.c#put_page_from_l4e+0x88/0xc0
(XEN)[<ffff82d080179697>] free_page_type+0x157/0x630
(XEN)[<ffff82d080179cdf>] mm.c#__put_page_type+0x16f/0x290
(XEN)[<ffff82d08017a438>] mm.c#put_page_from_l4e+0x88/0xc0
(XEN)[<ffff82d080179697>] free_page_type+0x157/0x630
(XEN)[<ffff82d080179cdf>] mm.c#__put_page_type+0x16f/0x290
(XEN)[<ffff82d08017a438>] mm.c#put_page_from_l4e+0x88/0xc0
(XEN)[<ffff82d080179697>] free_page_type+0x157/0x630
(XEN)[<ffff82d080179cdf>] mm.c#__put_page_type+0x16f/0x290
(XEN)[<ffff82d08017a438>] mm.c#put_page_from_l4e+0x88/0xc0
(XEN)[<ffff82d080179697>] free_page_type+0x157/0x630
(XEN)[<ffff82d080179cdf>] mm.c#__put_page_type+0x16f/0x290
(XEN)[<ffff82d08017a438>] mm.c#put_page_from_l4e+0x88/0xc0
(XEN)[<ffff82d080179697>] free_page_type+0x157/0x630
(XEN)[<ffff82d080179cdf>] mm.c#__put_page_type+0x16f/0x290
(XEN)[<ffff82d08017a438>] mm.c#put_page_from_l4e+0x88/0xc0
(XEN)[<ffff82d080179697>] free_page_type+0x157/0x630
(XEN)[<ffff82d080179cdf>] mm.c#__put_page_type+0x16f/0x290
(XEN)[<ffff82d08017a438>] mm.c#put_page_from_l4e+0x88/0xc0
(XEN)[<ffff82d080179697>] free_page_type+0x157/0x630
(XEN)[<ffff82d080179cdf>] mm.c#__put_page_type+0x16f/0x290
(XEN)[<ffff82d08017a438>] mm.c#put_page_from_l4e+0x88/0xc0
(XEN)[<ffff82d080179697>] free_page_type+0x157/0x630
(XEN)[<ffff82d080179cdf>] mm.c#__put_page_type+0x16f/0x290
(XEN)[<ffff82d08017a438>] mm.c#put_page_from_l4e+0x88/0xc0
(XEN)[<ffff82d080179697>] free_page_type+0x157/0x630
(XEN)[<ffff82d080179cdf>] mm.c#__put_page_type+0x16f/0x290
(XEN)[<ffff82d08017a438>] mm.c#put_page_from_l4e+0x88/0xc0
(XEN)[<ffff82d080179697>] free_page_type+0x157/0x630
(XEN)[<ffff82d080179cdf>] mm.c#__put_page_type+0x16f/0x290
(XEN)[<ffff82d08017a438>] mm.c#put_page_from_l4e+0x88/0xc0
(XEN)[<ffff82d080179697>] free_page_type+0x157/0x630
(XEN)[<ffff82d080179cdf>] mm.c#__put_page_type+0x16f/0x290
(XEN)[<ffff82d08017a438>] mm.c#put_page_from_l4e+0x88/0xc0
(XEN)[<ffff82d080179697>] free_page_type+0x157/0x630
(XEN)[<ffff82d080179cdf>] mm.c#__put_page_type+0x16f/0x290
(XEN)[<ffff82d08017a438>] mm.c#put_page_from_l4e+0x88/0xc0
(XEN)[<ffff82d080179697>] free_page_type+0x157/0x630
(XEN)[<ffff82d080179cdf>] mm.c#__put_page_type+0x16f/0x290
(XEN)[<ffff82d08017a438>] mm.c#put_page_from_l4e+0x88/0xc0
(XEN)[<ffff82d080179697>] free_page_type+0x157/0x630
(XEN)[<ffff82d080179cdf>] mm.c#__put_page_type+0x16f/0x290
(XEN)[<ffff82d08017a438>] mm.c#put_page_from_l4e+0x88/0xc0
(XEN)[<ffff82d080179697>] free_page_type+0x157/0x630
(XEN)[<ffff82d080179cdf>] mm.c#__put_page_type+0x16f/0x290
(XEN)[<ffff82d08017a438>] mm.c#put_page_from_l4e+0x88/0xc0
(XEN)[<ffff82d080179697>] free_page_type+0x157/0x630
(XEN)[<ffff82d080179cdf>] mm.c#__put_page_type+0x16f/0x290
(XEN)[<ffff82d08017a438>] mm.c#put_page_from_l4e+0x88/0xc0
(XEN)[<ffff82d080179697>] free_page_type+0x157/0x630
(XEN)[<ffff82d080179cdf>] mm.c#__put_page_type+0x16f/0x290
(XEN)[<ffff82d08017a438>] mm.c#put_page_from_l4e+0x88/0xc0
(XEN)[<ffff82d080179697>] free_page_type+0x157/0x630
(XEN)[<ffff82d080179cdf>] mm.c#__put_page_type+0x16f/0x290
(XEN)[<ffff82d08017a438>] mm.c#put_page_from_l4e+0x88/0xc0
(XEN)[<ffff82d080179697>] free_page_type+0x157/0x630
(XEN)[<ffff82d080179cdf>] mm.c#__put_page_type+0x16f/0x290
(XEN)[<ffff82d08017a438>] mm.c#put_page_from_l4e+0x88/0xc0
(XEN)[<ffff82d080179697>] free_page_type+0x157/0x630
(XEN)[<ffff82d080179cdf>] mm.c#__put_page_type+0x16f/0x290
(XEN)[<ffff82d08017a438>] mm.c#put_page_from_l4e+0x88/0xc0
(XEN)[<ffff82d080179697>] free_page_type+0x157/0x630
(XEN)[<ffff82d08016af21>] io_apic.c#ack_edge_ioapic_irq+0x11/0x60
(XEN)[<ffff82d08016af21>] io_apic.c#ack_edge_ioapic_irq+0x11/0x60
(XEN)[<ffff82d080179cdf>] mm.c#__put_page_type+0x16f/0x290
(XEN)[<ffff82d0801793ae>] mm.c#get_page_from_pagenr+0x4e/0x60
(XEN)[<ffff82d08017a438>] mm.c#put_page_from_l4e+0x88/0xc0
(XEN)[<ffff82d080179697>] free_page_type+0x157/0x630
(XEN)[<ffff82d080179cdf>] mm.c#__put_page_type+0x16f/0x290
(XEN)[<ffff82d0801768e9>] is_iomem_page+0x9/0x70
(XEN)[<ffff82d08010baec>] grant_table.c#__gnttab_unmap_common_complete+0x17c/0x360
(XEN)[<ffff82d08017a438>] mm.c#put_page_from_l4e+0x88/0xc0
(XEN)[<ffff82d080179697>] free_page_type+0x157/0x630
(XEN)[<ffff82d080179cdf>] mm.c#__put_page_type+0x16f/0x290
(XEN)[<ffff82d08017a438>] mm.c#put_page_from_l4e+0x88/0xc0
(XEN)[<ffff82d080146684>] serial_tx_interrupt+0xe4/0x120
(XEN)[<ffff82d080179697>] free_page_type+0x157/0x630
(XEN)[<ffff82d080179cdf>] mm.c#__put_page_type+0x16f/0x290
(XEN)[<ffff82d08017234a>] do_IRQ+0x22a/0x660
(XEN)[<ffff82d08017a438>] mm.c#put_page_from_l4e+0x88/0xc0
(XEN)[<ffff82d080179697>] free_page_type+0x157/0x630
(XEN)[<ffff82d080179cdf>] mm.c#__put_page_type+0x16f/0x290
(XEN)[<ffff82d08017a438>] mm.c#put_page_from_l4e+0x88/0xc0
(XEN)[<ffff82d080179697>] free_page_type+0x157/0x630
(XEN)[<ffff82d080237f6f>] common_interrupt+0x5f/0x70
(XEN)[<ffff82d080179cdf>] mm.c#__put_page_type+0x16f/0x290
(XEN)[<ffff82d08017a438>] mm.c#put_page_from_l4e+0x88/0xc0
(XEN)[<ffff82d080179697>] free_page_type+0x157/0x630
(XEN)[<ffff82d08017a028>] put_page_from_l1e+0xb8/0x130
(XEN)[<ffff82d080179cdf>] mm.c#__put_page_type+0x16f/0x290
(XEN)[<ffff82d08017a438>] mm.c#put_page_from_l4e+0x88/0xc0
(XEN)[<ffff82d080179697>] free_page_type+0x157/0x630
(XEN)[<ffff82d08017a28a>] mm.c#put_page_from_l2e+0x7a/0x190
(XEN)[<ffff82d080179cdf>] mm.c#__put_page_type+0x16f/0x290
(XEN)[<ffff82d08017a438>] mm.c#put_page_from_l4e+0x88/0xc0
(XEN)[<ffff82d080179697>] free_page_type+0x157/0x630
(XEN)[<ffff82d080179cdf>] mm.c#__put_page_type+0x16f/0x290
(XEN)[<ffff82d08017a438>] mm.c#put_page_from_l4e+0x88/0xc0
(XEN)[<ffff82d080179697>] free_page_type+0x157/0x630
(XEN)[<ffff82d080179cdf>] mm.c#__put_page_type+0x16f/0x290
(XEN)[<ffff82d08017a438>] mm.c#put_page_from_l4e+0x88/0xc0
(XEN)[<ffff82d080179697>] free_page_type+0x157/0x630
(XEN)[<ffff82d0801793ae>] mm.c#get_page_from_pagenr+0x4e/0x60
(XEN)[<ffff82d080179cdf>] mm.c#__put_page_type+0x16f/0x290
(XEN)[<ffff82d0801791e3>] get_page+0x13/0xf0
(XEN)[<ffff82d080183056>] do_mmuext_op+0x1056/0x1500
(XEN)[<ffff82d080182000>] do_mmuext_op+0/0x1500
(XEN)[<ffff82d080169c96>] pv_hypercall+0xf6/0x1c0
(XEN)[<ffff82d08019bea3>] do_page_fault+0x163/0x4c0
(XEN)[<ffff82d080237abe>] entry.o#test_all_events+0/0x2a
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) DOUBLE FAULT -- system shutdown
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...
==============================
This PoC just causes a DoS, but as far as I can tell, Xen only uses
guard pages for the stack (via memguard_guard_stack()) in debug builds,
which would mean that this is a potentially exploitable issue in release builds.
Proof of Concept:
https://gitlab.com/exploit-database/exploitdb-bin-sploits/-/raw/main/bin-sploits/43014.zip