Unicorn is a multi-platform multi-architecture CPU emulator based on Qemu. It is written in C and has bindings for several programming languages e. g. python, ruby, Java, go.

A mode of modern x86 CPUs is called Protected Mode. This mode offers features such as virtual memory and paging. The memory is split  into segments and the segment points to them. Since the segment registers are 16bit registers they ca nnot hold the address of a segment. Instead of the addresses to the segments the segments registers hold segment selectors which are an indices of the global or local descriptor table (GDT, LDT). You can find here and here more information about these topics.

If you use Unicorn it is not needed to create a GDT and set the GDT register (GDTR). But in some cases the segment registers have to be set to a proper value, e. g. executing Linux or Windows binaries.

To write more readable code I use some constants:

F_GRANULARITY = 0x8	# If set block=4KiB otherwise block=1B
F_PROT_32 = 0x4		# Protected Mode 32 bit
F_LONG = 0x2		# Long Mode
F_AVAILABLE = 0x1 	# Free Use
A_PRESENT = 0x80	# Segment active
A_PRIV_3 = 0x60		# Ring 3 Privs
A_PRIV_2 = 0x40		# Ring 2 Privs
A_PRIV_1 = 0x20		# Ring 1 Privs
A_PRIV_0 = 0x0		# Ring 0 Privs
A_CODE = 0x10		# Code Segment
A_DATA = 0x10		# Data Segment
A_TSS = 0x0		# TSS
A_GATE = 0x0		# GATE
A_EXEC = 0x8		# Executable
A_DATA_WRITABLE = 0x2
A_CODE_READABLE = 0x2
A_DIR_CON_BIT = 0x4
S_GDT = 0x0		# Index points to GDT
S_LDT = 0x4		# Index points to LDT
S_PRIV_3 = 0x3		# Ring 3 Privs
S_PRIV_2 = 0x2		# Ring 2 Privs
S_PRIV_1 = 0x1		# Ring 1 Privs
S_PRIV_0 = 0x0		# Ring 0 Privs

To create the GDT and the selector for the segment register I wrote some little helper functions:

def create_selector(idx, flags):
    to_ret = flags
    to_ret |= idx << 3
    return to_ret

def create_gdt_entry(base, limit, access, flags):
    to_ret = limit & 0xffff;
    to_ret |= (base & 0xffffff) << 16;
    to_ret |= (access & 0xff) << 40;
    to_ret |= ((limit >> 16) & 0xf) << 48;
    to_ret |= (flags & 0xff) << 52;
    to_ret |= ((base >> 24) & 0xff) << 56;
    return pack('Q',to_ret)

def write_gdt(uc, gdt, mem):
    for idx, value in enumerate(gdt):
        offset = idx * GDT_ENTRY_SIZE
        uc.mem_write(mem + offset, value)

In my example I create three memory regions:

  • Memory for GDT
  • Memory for the segment
  • Memory for the code
CODE_ADDR = 0x40000
CODE_SIZE = 0x1000
CODE = 'some code bytes here'

GDT_ADDR = 0x3000
GDT_LIMIT = 0x1000
GDT_ENTRY_SIZE = 0x8

SEGMENT_ADDR = 0x5000
SEGMENT_SIZE = 0x1000

uc = Uc(UC_ARCH_X86, UC_MODE_32)
uc.mem_map(GDT_ADDR, GDT_LIMIT)
uc.mem_map(SEGMENT_ADDR, SEGMENT_SIZE)
uc.mem_map(CODE_ADDR, CODE_SIZE)

Now I fill the memory and the registers with the needed values. The first entry of the GDT has to be 0, therefore I start writing the entry at GDT_ADDR + GDT_ENTRY_SIZE.

# Create the GDT entries
gdt = [create_gdt_entry(0,0,0,0) for i in range(31)]
gdt[15] = create_gdt_entry(GS_SEGMENT_ADDR, GS_SEGMENT_SIZE, A_PRESENT | A_DATA | A_DATA_WRITABLE | A_PRIV_3 | A_DIR_CON_BIT, F_PROT_32)

gdt[16] = create_gdt_entry(0, 0xfffff000 , A_PRESENT | A_DATA | A_DATA_WRITABLE | A_PRIV_3 | A_DIR_CON_BIT, F_PROT_32)  # Data Segment

gdt[17] = create_gdt_entry(0, 0xfffff000 , A_PRESENT | A_CODE | A_CODE_READABLE | A_EXEC | A_PRIV_3 | A_DIR_CON_BIT, F_PROT_32)  # Code Segment

gdt[18] = create_gdt_entry(0, 0xfffff000 , A_PRESENT | A_DATA | A_DATA_WRITABLE | A_PRIV_0 | A_DIR_CON_BIT, F_PROT_32)  # Stack Segment

write_gdt(uc, gdt, GDT_ADDR)

# Fill the GDTR register
uc.reg_write(UC_X86_REG_GDTR, (0, GDT_ADDR, len(gdt)*GDT_ENTRY_SIZE-1, 0x0))

# Set the selector
selector = create_selector(15, S_GDT | S_PRIV_3)
uc.reg_write(UC_X86_REG_GS, selector)
selector = create_selector(16, S_GDT | S_PRIV_3)
uc.reg_write(UC_X86_REG_DS, selector)
selector = create_selector(17, S_GDT | S_PRIV_3)
uc.reg_write(UC_X86_REG_CS, selector)
selector = create_selector(18, S_GDT | S_PRIV_0)
uc.reg_write(UC_X86_REG_SS, selector)

The last step is to execute the code which uses the segment registers to access memory.

uc.emu_start(CODE_ADDR, CODE_ADDR+ len(CODE))

Here you can download the complete script.