Source: https://bugs.chromium.org/p/project-zero/issues/detail?id=851
This is very similar to forshaw's bug (<https://code.google.com/p/android/issues/detail?id=200617>, <https://bugs.chromium.org/p/project-zero/issues/detail?id=727>).
The servicemanager, when determining whether the sender of a binder transaction is authorized to register a service via SVC_MGR_ADD_SERVICE, looks up the sender's SELinux context using getpidcon(spid), where spid is the value of the sender_pid field in the binder_transaction_data that was received from the binder driver.
This is problematic because getpidcon($pid) is only safe to use if the caller either knows that the process originally referenced by $pid can't transition from zombie to dead (normally because it is the parent or ptracer of $pid) or if the caller can validate that the process referenced by $pid can not have spawned before $pid referred to the correct process based on the age of the process that $pid points to after the getpidcon() call. (The same thing applies to pretty much any API that refers to processes using PIDs.)
This means that an attacker can, at least theoretically, register arbitrary services that would normally be provided by the system_server if he can execute / cause execution of the following operations in the right order:
- The main exploit process $exploit forks, creates process $child
- $child does $binder_fd = open("/dev/binder", ...)
- $child forks, creates process $subchild
- $child exits. The binder_proc belonging to $binder_fd still holds a reference
to $child. $child transitions to zombie status.
- The exploit repeatedly forks processes that instantly die until there are no unallocated
PIDs between ns_last_pid and $child's PID.
- $subchild sends a SVC_MGR_ADD_SERVICE binder message to the service manager
- the service manager receives the binder message. The kernel fills the
sender_pid field with the result of `task_tgid_nr_ns(sender, [...])`,
where `sender` is `t->from->proc->tsk`, the task_struct of $child.
- $exploit uses `waitpid()` to transition $child from zombie to dead status
- $exploit sends a HANDLE_APPLICATION_STRICT_MODE_VIOLATION_TRANSACTION
binder message to system_server
- system_server launches a new worker thread
(in ActivityManagerService.logStrictModeViolationToDropBox)
- the service manager calls getpidcon()
- system_server's worker thread dies
As far as I can tell, this exploit approach contains the following race conditions:
- If $exploit calls waitpid() before the service manager has performed the binder
read (more accurately, before the task_tgid_nr_ns call), the service manager sees
PID 0. This race isn't hard to win, but it would help to have some primitive to either stall
the service manager after the task_tgid_nr_ns call or at least detect whether it has
performed the binder read. On older Android versions, voluntary_ctxt_switches
in /proc/$pid/status might have helped with that, but nowadays, that's blocked.
When this race condition fails, you'll get an SELinux denial with
scontext=servicemanager.
- If the service manager calls getpidcon() before the system_server has launched a
worker thread, the call will either fail (if there is no such PID) or return the
not-yet-reaped $child process. Again, having a primitive for stalling the service manager
would be useful here.
When this race condition fails, it will cause either an SELinux denial with
scontext=untrusted_app or an "failed to retrieve pid context" error from the
service manager.
- If the system_server's worker thread dies before getpidcon(), getpidcon() will fail.
To avoid this race, it would be very helpful to be able to spawn a thread in system_server
that has a controlled or at least somewhat longer lifetime.
Because of the multiple races, it is hard to hit this bug, at least without spending days on finding ways to eliminate races or widen race windows, optimizing the exploit to not cycle through the whole pid range for every attempt and so on. Because of that, I decided to run my PoC on a patched Android build (based on android-6.0.1_r46) with the following modifications to show that, while the race window is very hard to hit, there is such a race:
-------
$ repo diff
project frameworks/base/
diff --git a/services/core/java/com/android/server/am/ActivityManagerService.java b/services/core/java/com/android/server/am/ActivityManagerService.java
index 33d0a9f..371ecd7 100644
--- a/services/core/java/com/android/server/am/ActivityManagerService.java
+++ b/services/core/java/com/android/server/am/ActivityManagerService.java
@@ -12269,6 +12269,9 @@ public final class ActivityManagerService extends ActivityManagerNative
if (report.length() != 0) {
dbox.addText(dropboxTag, report);
}
+try {
+Thread.sleep(2000);
+} catch (InterruptedException e) {}
}
}.start();
return;
project frameworks/native/
diff --git a/cmds/servicemanager/service_manager.c b/cmds/servicemanager/service_manager.c
index 7fa9a39..0600eb1 100644
--- a/cmds/servicemanager/service_manager.c
+++ b/cmds/servicemanager/service_manager.c
@@ -7,6 +7,7 @@
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
+#include <unistd.h>
#include <private/android_filesystem_config.h>
@@ -204,6 +205,9 @@ int do_add_service(struct binder_state *bs,
if (!handle || (len == 0) || (len > 127))
return -1;
+if (uid > 1000)
+sleep(2);
+
if (!svc_can_register(s, len, spid)) {
ALOGE("add_service('%s',%x) uid=%d - PERMISSION DENIED\n",
str8(s, len), handle, uid);
-------
These modifications widen the race windows sufficiently to be able to hit the bug with a few tries.
On the modified build, my PoC causes the following logcat output, demonstrating that the clipboard service has been replaced successfully:
06-15 21:41:00.470 11876 11876 E FIELD--FIELD: accessFlags
06-15 21:41:00.470 11876 11876 E FIELD--FIELD: declaringClass
06-15 21:41:00.470 11876 11876 E FIELD--FIELD: dexFieldIndex
06-15 21:41:00.470 11876 11876 E FIELD--FIELD: offset
06-15 21:41:00.470 11876 11876 E FIELD--FIELD: type
06-15 21:41:00.470 11876 11876 E FIELD--FIELD: ORDER_BY_NAME_AND_DECLARING_CLASS
06-15 21:41:00.480 11876 11876 W racer : NATIVE CODE:trying attack...
06-15 21:41:01.490 11876 11876 W racer : NATIVE CODE:child_pid == unused_pid + 1
06-15 21:41:01.490 11876 11876 W racer : NATIVE CODE:cycle_to_pid...
06-15 21:41:02.900 11876 11876 W racer : NATIVE CODE:cycle_to_pid done
06-15 21:41:04.910 992 992 E ServiceManager: SELinux: getpidcon(pid=11993) failed to retrieve pid context.
06-15 21:41:04.910 992 992 E ServiceManager: add_service('clipboard',63) uid=10052 - PERMISSION DENIED
06-15 21:41:08.920 11876 11876 W racer : NATIVE CODE:pid of last try: 11993
06-15 21:41:08.920 11876 11876 W racer : NATIVE CODE:trying attack...
06-15 21:41:09.930 11876 11876 W racer : NATIVE CODE:child_pid == unused_pid + 1
06-15 21:41:09.930 11876 11876 W racer : NATIVE CODE:cycle_to_pid...
06-15 21:41:11.330 11876 11876 W racer : NATIVE CODE:cycle_to_pid done
06-15 21:41:13.340 992 992 E ServiceManager: add_service('clipboard',63) uid=10052 - ALREADY REGISTERED, OVERRIDE
(Also, to further verify the success: After running the PoC, clipboard accesses in newly spawned apps cause null reference exceptions because the PoC's binder thread has been released in the meantime.)
The issue was tested in the android emulator, with a aosp_x86_64-eng build of the patched android-6.0.1_r46 release.
I have attached the PoC apk (with native code for aarch64 and x86_64; I'm not sure whether the PoC compiles correctly for 32bit) and the Android project tree - but as mentioned earlier, note that the PoC won't work on a build without my patches. If you want to compile it yourself, first run `aarch64-linux-gnu-gcc -static -o app/src/main/jniLibs/arm64-v8a/libracer.so racer.c -Wall -std=gnu99 && gcc -static -o app/src/main/jniLibs/x86_64/libracer.so racer.c` to compile the binaries, then build the project in Android Studio.
I believe that the proper way to fix this issue would be to let the binder driver record the sender's SELinux context when a transaction is sent and then either let the recipient extract the current transaction's SELinux context via an ioctl or store the SELinux context in the binder message. PIDs should not be used during the SELinux context lookup.
Regarding impact:
It looks as if the vulnerable code in the service manager is reachable from isolated_app context, although being isolated is probably going to make it even more difficult to trigger the bug.
After a service is replaced, already-running code should usually continue to use the old service because that reference is cached.
If there is e.g. some system_app that performs permissions checks (which use the "permission" service), it might be possible to bypass such permission checks using this bug, by replacing the real permission service with one that always grants access.
Proof of Concept:
https://gitlab.com/exploit-database/exploitdb-bin-sploits/-/raw/main/bin-sploits/40381.zip