De-obfuscating Jump Chains with Binary Ninja

Published on: 20 03 2018 | Modified on: 20 06 2024

Author: Medhi Talbi

2 minutes

Malware authors uses extensive obfuscation techniques such as packing, junk code insertion, opaque predicates to harden malware analysis. Binary ninja has recently released a plugin to remove opaque predicates - that is, branch paths that are never taken. Thanks to Medium Level Intermediate Language (MLIL), only few lines of python of code are required to detect and remove opaque predicates. Following this blog post, we decided to implement a similar plugin to de-obfuscate CFGs by removing useless jump chains.

Detecting chained jumps

A basic block can be merged with its child if the basic block ends with a jump instruction and its child has only one parent. For example, in the following figure, the top basic block can be merged with its direct child.

Chained Jumps

Binary Ninja API provides facilities to check incoming and outgoing edges. This allows us to detect useless jumps:

nb_edges = len(edges)
if nb_edges > 0 and inst_name in JUMPS:
	if nb_edges == 2 or (nb_edges == 1 and len(edges[0].target.incoming_edges) != 1):
		lifted_inst = "{} loc_{:x}".format(inst_name, edges[0].target.start)
	else:
		return ''

Processing basic blocks

The simplest way to remove useless jumps (and occasionnaly superfluous nop instructions), is to rewrite the basic blocks by travsersing the CFG, and then re-assemble the instructions:

def flatten(self, address):
	code = ''
	self.todo(address)

	while len(self._todo) > 0:
		address = self._todo.pop()

		if address in self._visited:
			continue

		self._visited.append(address)
		block = self._function.get_basic_block_at(address)
		code += self.process_block(block)
		edges = block.outgoing_edges
		if not edges:
			continue
		if len(edges) == 1:
			self.todo(edges[0].target.start)
		else:
			self.todo(edges[0].target.start)
			self.todo(edges[1].target.start)

	return code

For each basic block, we create a label at the top of the block ('loc_' + basic_block.start). Then we process each instruction in the block by discarding nops, mapping symbols (and tokens such as 'sub_' and 'data_') to their addresses, and remove the jump if the block can be merged with its child.

Note that if the block ends with a branch instruction, we have to substitute the target address of the branch instruction with its corresponding label. For instance, the instruction 'je 0x404142' will be substituted by 'je loc_0x404142'. This is required since original instruction offsets are shifted due to instruction removal (nops and useless jumps).

Once all basic blocks have been processed, we assemble the resulting code with:

def assemble(self, code):
	return self._arch.assemble(code, self._function.start)

Please note that sometimes Binary Ninja fails to assemble it's own disassembled code. This is the case for example for stos{b,w,d} instructions. Binary Ninja uses its own disassembler and sometimes instructions does not round-trip when assembled with yasm (default x86 assembler in Binary Ninja).

De-obfuscating CFGs

We have applied the plugin on some functions of the Locky-ransomware (a92e469b45331c7102f4981df9699843) which intensively relies on jump chains to obfuscate the binary. For example, the CFG of the main Locky function ('sub_929ba0') is made of 1102 basic blocks. After applying the plugin, we got a CFG made of 65 basic blocks.