Wednesday, 29 February 2012

Automation of Reversing Through Scripting -Amit Malik

Contents

Introduction

This article teaches you how to become smart reverser by automating your reverse engineering tasks through Scripting.
It is the part of our free "Reverse Engineering & Malware Analysis Course" [Reference 1]. It is primarily written to act as additional learning material for our session on 'Part 5 - Reverse Engineering Tools' where in we are going to demonstrate important reversing tools.
You can visit our training page here [Reference 1] and all the presentations of previous sessions here [Reference 2]
Reverse engineering is a sophisticated task especially when we analyse large applications or packed files like malware or normal applications for vulnerabilities.
Some of the common tasks include
  • Tracking memory allocation
  • Tracking specific API calls
  • Unpacking a family of malwares
  • Intelligent decision making based on some specific events
These are just some simple examples where automation will help in a great way. For example, lets say that we want to monitor HeapAlloc calls in an application and application may call HeapAlloc for hundreds of times but we want to log the call for some specific values like if allocation request is greater than 1024 bytes etc. A simple script will give us all the information virtually on the spot while in manual task we have to manually create breakpoints on HeapAlloc and have to check if the allocation size is greater than 1024 bytes or not which eventually increase the analysis time for such a simple task.


In this article, I will show you how to automate some of these common tasks through Scripting for main reversing debuggers i.e Ollydbg, Immunity Debugger, Pydbg & Windbg with practical code samples.
 
Ollydbg - Playing with OllyScript

Ollydbg [Reference 3] is one of the best ring 3 (user-land) debugger. It has a very nice gui interface. It is one of the most popular debugger on the planet and has very mature community support. Ollydbg is my all time favourite debugger :)
But ollydbg doesn't support scripting natively instead ollydbg support plugins. So people written scripting plugins for ollydbg, the one that i will use in this article is Ollyscript by ShaG.
You can download Ollyscript from here [Reference 4].
Ollyscript comes with a nice help file. It has similar syntax like assembly programming and very easy to understand. It supports almost all functionalities like dumping memory, decision making etc.
But when you compare it with other debuggers scripting environment then it will seems to be a rigid type of scripting environment, I will discuss more about it later in this article.
So let's understand Ollydbg scripting environment i.e Ollyscript with the help of a simple example.
 
Problem Statement:
Let say we are analysing an application for a simple bug and we want to identify the function that is actually causing the problem. But the function is deep inside the application and manually it will take hours of analysis time.
So here we want to track the execution flow after a specific point up to the function that is causing the problem, more precisely I want to log the return address of each function.
Solution:
The above problem can be solved by multiple methods but to demonstrate it in a very simple way I will use the following steps,
  1. From current EIP, search for calls and create breakpoint on that call
  2. Step into the call
  3. Log the value at ESP (i.e return address) and search for calls at return address and
  4. Breakpoint on the call
  5. Repeat step 1, 2, 3 inside the call
  6. Run
Below is the tiny script to accomplish this task. Please note that the script is just to demonstrate the concept, it may fail when call used after decision instructions. :)
/*
Author: Amit Malik
http://www.securityxploded.com
*/

EOB breakprocess
var return
var infunction
var x
var y
mov infunction,EIP
mov return,EIP

start:
findop return,#E8#
mov x,$RESULT
findop infunction,#E8#
mov y,$RESULT
cmp x,0
ja breaksetx
backx:
cmp y,0
ja breaksety
backy:
run

breakprocess:
sti
mov return,[esp]
msg return
sti
mov infunction,EIP
jmp start

breaksetx:
bp x
jmp backx

breaksety:
bp y
jmp backy
Please refer to the Ollyscript help file [Reference 4] for more details. Here I will explain only important keywords and terms. The script start with EOB (Execute over breakpoint), as name states it will execute the code inside the label that is specified with EOB when a breakpoint hit. In this code it will execute the breakprocess label code.
var - declares a variable.
mov - is similar to assembly
findop - search for opcode from the specified address & stores the results into a $RESULT variable
run - is similar to F9 in ollydbg
sti - step into - similar to F7 in ollydbg
msg - will show a messagebox - (log should be used but I used msg just for visual pleasure :))
As you can see that scripting is similar to assembly language. Most of the time people use ollyscripting for unpacking malwares. I have never seen anyone using it for vulnerability analysis. It is not very much flexible and also limited in its functionality. But it can be used for some stuff that we want to automate through ollydbg.
Immunity Debugger

Immunity debugger [Reference 3] is a pure python debugger with similar GUI interface as Ollydbg. It is developed by Immunity Inc. and according to immunity it is the only debugger designed specifically for vulnerability research.
It has some very powerful pycommands like heap, lookasidelist etc. one of the major advantage of this debugger is that it provides plethora of APIs for various reversing tasks and supports python which makes it one of the best debugger for reversing.
In the reference section [Reference 6] you can find some good tutorials and projects based on Immunity debuggers and also it comes with a nice help file so don't forget to check that as well.
Problem statement:
We want to search all "jmp esp" instruction addresses.
Solution Script:
You can use the below script directly on Immunity debugger python shell
data = "jmp esp"
asm = imm.assemble(data)   # imm is object of immlib class
results = imm.search(asm)

for addr in results:
 print "%s  %0.8x" % (data,addr)
The above 5 lines of code will give you all the "jmp esp" addresses. This is the beauty of scripting :)
Pydbg

Pydbg [Reference 3] is also a pure python based debugger. Pydbg is my favourite debugger, I use it in various automation tasks and it is extremely flexible and powerful.
Problem Statement:
We want to track VirtualAlloc API whenever VirtualAlloc is called, our script should display its arguments and the returned pointer.
VirtualAlloc: LPVOID WINAPI VirtualAlloc(
__in_opt LPVOID lpAddress,
__in SIZE_T dwSize,
__in DWORD flAllocationType,
__in DWORD flProtect
);
Solution:
  1. Put breakpoint on VirtualAlloc
  2. Extract parameters from stack
  3. Extract return address from stack and put breakpoint on that
  4. Get the value from EAX register.
# Author: Amit Malik
# http://www.securityxploded.com


import sys
import pefile
import struct
from pydbg import *
from pydbg.defines import *


def ret_addr_handler(dbg):
 
 lpAddress = dbg.context.Eax                      # Get value returned by VirtualAlloc
 print " Returned Pointer: ",hex(int(lpAddress))
 
 return DBG_CONTINUE

def virtual_handler(dbg):
 
 print "****************"
 pdwSize = dbg.context.Esp + 8                   # 2nd argument to VirtualAlloc
 rdwSize = dbg.read_process_memory(pdwSize,4)
 dwSize  = struct.unpack("L",rdwSize)[0]
 dwSize  = int(dwSize)
 print "Allocation Size: ",hex(dwSize)
 
 pflAllocationType = dbg.context.Esp + 12          # 3rd argument to VirtualAlloc 
 rflAllocationType = dbg.read_process_memory(pflAllocationType,4)
 flAllocationType  = struct.unpack("L",rflAllocationType)[0] 
 flAllocationType  = int(flAllocationType)
 print "Allocation Type: ",hex(flAllocationType)
 
 pflProtect = dbg.context.Esp + 16                  # 4th Argument to VirtualAlloc 
 rflProtect = dbg.read_process_memory(pflProtect,4)
 flProtect  = struct.unpack("L",rflProtect)[0] 
 flProtect  = int(flProtect)
 print "Protection Type: ",hex(flProtect)

 pret_addr = dbg.context.Esp                        # Get return Address
 rret_addr = dbg.read_process_memory(pret_addr,4)
 ret_addr  = struct.unpack("L",rret_addr)[0]
 ret_addr  = int(ret_addr)
 dbg.bp_set(ret_addr,description="ret_addr breakpoint",restore = True,handler = ret_addr_handler)
 
 return DBG_CONTINUE

def entry_handler(dbg):
 
 virtual_addr = dbg.func_resolve("kernel32.dll","VirtualAlloc")   # Get VirtualAlloc address
 if virtual_addr: 
  dbg.bp_set(virtual_addr,description="Virtualalloc breakpoint",restore = True,handler = virtual_handler)
  
 return DBG_CONTINUE

def main():
 
 file = sys.argv[1]
 pe = pefile.PE(file)
 # get entry point 
 entry_addr = pe.OPTIONAL_HEADER.AddressOfEntryPoint + pe.OPTIONAL_HEADER.ImageBase 
 dbg = pydbg()          # get pydbg object
 dbg.load(file)
 dbg.bp_set(entry_addr,description="Entry point breakpoint",restore = True,handler = entry_handler)
 dbg.run()

if __name__ == '__main__':
 main()
      
Notice that in this script first i am setting breakpoint on entry point and then on VirtualAlloc not directly to VirtualAlloc because pydbg does not support deferred breakpoints. I am also ignoring 1st argument to VirtualAlloc i.e lpAddress, see VirtualAlloc specification in problem statement.
This script uses two modules PEFile and Pydbg, PEFile is used to get the entry point.
Windbg

Windbg [Reference 3] is the official Microsoft debugger. It is the most powerful debugger available for reversing on windows platform (mainly Kernel side of it) and it also supports symbols.
Windbg provides its own scripting language which is similar to C language, it also comes with a great help file. I highly recommend reading help file before we start with Windbg.
Problem Statement:
We want to track malloc, whenever malloc is called, our script should display requested size for allocation and returned pointer.
Solution:
On the same lines as previous example.
  1. Breakpoint on malloc
  2. Extract parameter from stack
  3. Extract return address from stack and put breakpoint on it
  4. Get value from EAX register
bp msvcrt!malloc ".printf \"Size: %x\n\",poi(esp+4);gu;.printf \"Returned Pointer: %x\n\",eax;g"
When we use multiple commands in a single line then we have to separate them using semicolon (;)
bp - sets breakpoint
msvcrt!malloc - this is DLL!Method (here DLL name & function name are separated by ! )
These are known as conditional breakpoints and in conditional breakpoints we want to perform something when breakpoint hit. In our case we want extract the size of allocation from stack.
So simple syntax is:
bp address or dll!method or dll!method+offset "block that should be executed when breakpoint hits"
poi - is similar to pointer in c
gu - go up - execute until return
g - go or execute
For more interesting commands please check out the Windbg help file.
Conclusion
This article is an additional learning material to our next session on 'Part 5 - Reverse Engineering Tools' - part of our FREE Reversing/Malware Analysis course [Reference 1]

No comments:

Post a Comment