A virtual machine written in Python that executes x86 binaries according to the Intel Software Developer Manual
PyVM executes x86 (IA-32) bytecode in pure Python, without any dependencies.
It can run multiple types of executables:
bytes
and bytearray
s as bytecode)Features:
VM/Registers.py
, VM/CPU.py
, VM/fetchLoop.py
, VM/misc.py
)
EFLAGS
register. See file #1;VM/FPU.py
)
binary80
;control
, status
and flag
registers;VM/Memory.py
)
VM/instructions/*
)
and
, or
, xor
, test
, neg
, not
, sal
, sar
, shl
, shr
, shld
, shrd
;nop
, jmp
, jcc
, setcc
, cmovcc
, bt
, int
, call
, ret
, enter
, leave
, cpuid
;fld
, fst
, fstp
, fist
, fistp
, fmul
, fmulp
, fimul
, faddp
, fdiv
, fdivp
,
fucom
, fucomp
, fucompp
, fcomi
, fcomip
, fucomip
, fucomipp
, fldcw
, fstcw
, fnstcw
, fxch
;add
, sub
, cmp
, adc
, sbb
, inc
, dec
, mul
, imul
, div
, idiv
;mov
, movs
, movsx
, movsxd
, movzx
, push
, pop
, lea
, xchg
, cmpxchg
,
cbw
, cwde
, cwd
, cdq
, cmc
, clc
, cld
, stc
, std
, bsf
, bsr
;stos
.VM/kernel/kernel.py
, VM/kernel/kernel_filesystem.py
, VM/kernel/kernel_memory.py
, VM/kernel/kernel_sys.py
)
VM/__init__.py:VM.interrupt
;sys_read
, sys_write
, sys_writev
, sys_open
, sys_close
, sys_unlink
, sys_llseek
. See file #2;brk
, sys_set_thread_area
, sys_set_tid_address
, mmap
, munmap
. See file #3;sys_exit
, sys_exit_group
, sys_clock_gettime
, sys_ioctl
, sys_newuname
. See file #4.VM/__main__.py
)
PyVM-master
(or wherever you downloaded PyVM);./C/real_life/nasm -h
) like this: python3 -OO -m VM 'C/real_life/nasm -h'
Simple example:
import VM # import the module
def parse_code(code: str) -> bytes:
# This just converts the prettified code below to the raw, ugly bytecode. You can ignore this function.
import re
binary = ''
regex = re.compile(r"[0-9a-f]+:\s+([^;]+)\s*;.*", re.DOTALL)
for i, line in enumerate(code.strip().splitlines(keepends=False)):
if line.startswith(';'):
continue
match = regex.match(line)
assert match is not None, f"Could not parse code (line {i})"
binary += match.group(1)
return bytes.fromhex(binary)
if __name__ == "__main__":
# This is the bytecode we'll run
code = """
; section .text
; _start:
0: b8 04 00 00 00 ;mov eax,0x4 ; SYS_WRITE
5: bb 01 00 00 00 ;mov ebx,0x1 ; STDOUT
a: b9 29 00 00 00 ;mov ecx,0x29 ; address of the message
f: ba 0e 00 00 00 ;mov edx,0xe ; length of the message
14: cd 80 ;int 0x80 ; interrupt kernel
16: e9 02 00 00 00 ;jmp 0x1d ; _exit
1b: 89 c8 ;mov eax,ecx ; this is here to mess things up if JMP doesn't work
; _exit:
1d: b8 01 00 00 00 ;mov eax,0x1 ; SYS_EXIT
22: bb 00 00 00 00 ;mov ebx,0x0 ; EXIT_SUCCESS
27: cd 80 ;int 0x80 ; interrupt kernel
; section .data
29: 48 65 6C 6C 6F 2C 20 77 6F 72 6C 64 21 0A ; "Hello, world!",10
"""
vm = VM.VMKernel(500) # Initialize the VM with the Linux kernel and give it 500 bytes of memory.
# EXECUTE IT!
vm.execute(
VM.ExecutionStrategy.BYTES, # We're executing raw bytecode
parse_code(code) # This is the actual bytecode
)
Output:
Hello, world!
[!] Process exited with code 0
Please see example_BYTES.py
, example_FLAT.py
and example_ELF.py
for more examples of usage.
Also see README
s in other directories: for example, VM/instructions
, VM/kernel
and many more.
0.1-beta
is almost two times faster than the commit 453fb47617f269fd8fa4ebe7c8cb28cc0611ede0
on master.0.1-beta
) it's a huge stub that has a minimal set of syscalls that allow basic programs to work.TODO
s and bugs.
You're welcome to contribute! Open issues, pull requests, contact me via Twitter or Reddit. Learn more about the x86 architecture and the Linux kernel and have fun!