FileBytes - Introduction
This is a short introduction of the module filebytes. filebytes is a python module which can be used to read and write the following fileformats:
- Executable and Linking Format (ELF),
- Portable Executable (PE),
- MachO and
- OAT (Android Runtime).
Open files
For each filetype (elf, pe, mach_o, oat) exists a separate module which has to be imported. Each module has all types defined needed for parsing that filetype. To open a file you can use the class corresponding to the filetype you want to read (ELF, PE, MachO, OAT).
from filebytes.elf import *
fileName = 'afile'
elffile = ELF(fileName)
# It is also possible to use bytes of a file.
data = open(fileName, 'rb').read()
elffile = ELF('any filename', data) # The filename is not used
Data Access
When the file is opened the content is parsed and you can access the data via several properties. The files data is generally hold in containers which have several attributes. The count and the names of these attributes depend on the file header field. If you want to access the structure which was used to parse the data, you can use the attribute 'header'. All container types have at least this attribute.
container = file.aHeaderContainer
print container.header.aHeaderField
If the header structure points to another region in the file, you can access this data via the 'bytes' attribute. This attribute holds a bytearray containing the bytes where the header structure points to.
container = file.aHeaderContainer
data = container.bytes
All file types have the attributes 'imageBase' and 'entryPoint'.
ib = file.imageBase
ep = file.entryPoint
ELF
To access the data the ELF class provides six properties:
elfheader = elffile.elfHeader # The ELF header
# elf structure: EHDR
segments = elffile.segments # return a list of segments
# elf structure: PHDR
segments = elffile.programHeaders # same like segments
sections = elffile.sections # return a list of all sections in the file
# elf structure: SHDR
For example how to access the .got section:
ls = ELF('/bin/ls')
got = [s for s in ls.sections if s.name='.got'][0]
print 'Name: %s' % got.name
print 'Offset: 0x%x' % got.header.sh_offset
print 'Address: 0x%x' % got.header.sh_addr
print 'Size: 0x%x' % got.header.sh_size
The following containers are used for those properties
class EhdrData(Container):
"""
header = ElfHeader (EHDR)
"""
class ShdrData(Container):
"""
header = SectionHeader (SHDR)
name = string (section name)
bytes = bytearray (section bytes)
raw = c_ubyte_array
.dynamic
content = List of DYN entries
.rel & .rela
relocations = list of REL or RELA
.dynsym & .symtab
symbols = list of SYM entries
"""
class PhdrData(Container):
"""
type = Programm Header Type (PHDR)
header = ProgrammHeader
bytes = bytearray (section bytes)
raw = c_ubyte_array
vaddr = virtual address (int)
offset = offset
"""
class SymbolData(Container):
"""
header = Symbol
name = string
type = int
bind = bind
"""
class RelocationData(Container):
"""
header = Relocation
symbol = SymbolData
type = type of relocation
"""
class DynamicData(Container):
"""
header = DYN
tag = value of class DT
"""
PE
To access the data the PE class provides six properties:
idh = pefile.imageDosHeader
inh = pefile.imageNtHeaders
sections = pefile.sections # a list of all sections
dataDirectory = pefile.dataDirectory # a list with the data where the DataDirectory in the OptionalHeader points to
# example
imports = pefile.dataDirectory[ImageDirectoryEntry.IMPORT]
firstThunk = imports[0].header.OriginalFirstThunk
print imports[0].dllName
print len(imports[0].importNameTable)
The following containers are used for those properties
class ImageDosHeaderData(Container):
"""
header = IMAGE_DOS_HEADER
"""
class ImageNtHeaderData(Container):
"""
header = IMAGE_NT_HEADERS
"""
class SectionData(Container):
"""
header = IMAGE_SECTION_HEADER
name = name of the section (str)
bytes = bytes of section (bytearray)
raw = bytes of section (c_ubyte_array)
"""
class ImportDescriptorData(Container):
"""
header = IMAGE_IMPORT_DESCRIPTOR
dllName = name of dll (str)
importNameTable = list of IMAGE_THUNK_DATA
importAddressTable = list of IMAGE_THUNK_DATA
"""
class ImportByNameData(Container):
"""
header = IMAGE_IMPORT_BY_NAME
name = name of function (str)
"""
class ThunkData(Container):
"""
header = IMAGE_THUNK_DATA
rva = relative virtual address of thunk
ordinal = None | Ordinal
importByName = None| ImportByNameData
"""
class ExportDirectoryData(Container):
"""
header = IMAGE_EXPORT_DIRECTORY
name = name of dll (str)
functions = list of FunctionData
"""
class FunctionData(Container):
"""
name = name of the function (str)
ordinal = ordinal (int)
rva = relative virtual address of function (int)
"""
MachO
To access the data the MachO class provides two properties:
mach_header = macho.machHeader
load_commands = macho.loadCommands
The following containers are used for those properties:
class MachHeaderData(Container):
"""
header = MachHeader
"""
class LoadCommandData(Container):
"""
header = LoaderCommand
bytes = bytes of the command bytearray
raw = bytes of the command c_ubyte_array
SegmentCommand
sections = list of SectionData
UuidCommand
uuid = uuid (str)
TwoLevelHintsCommand
twoLevelHints = list of TwoLevelHintData
DylibCommand
name = name of dylib (str)
DylinkerCommand
name = name of dynamic linker
"""
class SectionData(Container):
"""
header = Section
"""
class TwoLevelHintData(Container):
"""
header = TwoLevelHint
"""
OAT
Since an OAT file is an ELF file you can use all the properties of ELF. Additionally the OAT class provides two OAT specific properties:
- oatHeader
- oatDexHeader
oat_header = oat.oatHeader
oat_dex_header = oat.oatDexHeader # a list of OatDexHeaderData
The following containers are used for those properties:
class OatHeaderData(Container):
"""
header = OatHeader
keyValueStoreRaw = c_ubyte_array
keyValueStore = dict
"""
class OatDexFileHeaderData(Container):
"""
header = OatDexFileHeader w/o dexFileLocationSize and dexFileLocation
name = name of Dexfile str
classOffsets = c_uint_array
dexHeader = DexHeader
dexRaw = c_ubyte_array
dexBytes = bytearray
oatClasses = list of OatClasses
"""