Dalvik Virtual Execution with SmaliVM

Sometimes it’s useful to know what code does without executing it. You could read the code with your eyeballs and run it with your brain but that takes too long and it’s really hard, and executing code on a real machine can get messy, especially if it’s malicious. But what can you do if you want to understand a lot of malicious code? What if it’s obfuscated and even harder for your brain? Maybe you want to do some fancy analysis so you can accurately know when certain methods are called? Well, for this there’s executing on a virtual machine, i.e. virtual execution. There are many different ways of implementing a virtual machine. You could simulate an entire computer like VirtualBox and QEMU or you could simulate a smaller subset. The general idea is the same between all types: build a program which simulates executing other programs in all the important ways and gracefully fails for everything else.

Read More

Why Most Vulnerabilities Are Never Disclosed

When it comes to writing software, humans are the best game in town. Unfortunately, we’re absolutely terrible at it. Of course, we’re good at other stuff – recognizing faces, tool use, gossiping, and bi-pedal locomotion, but it turns out our brains are not so good at giving a computer the thousands of tiny, precise instructions necessary to validate an email address or properly deal with names. That fact we get anything to work at all is amazing

The bottom line is that if developers are writing code, they’re writing bugs and some bugs are vulnerabilities. Some are found and responsibly disclosed while others are kept secret or sold. For reasons which I shall explain, I believe that most security vulnerabilities are fixed but never disclosed.

Read More

Code Kata: Bloom Filter

If you’re unfamiliar with what a Code Kata is, check out my previous post Code Kata: TDD and Run-length Encoding

The goal for this kata is to learn an unfamiliar data structure. It’s called a bloom filter. I’ve read the Wikipedia article and have used them, but until I’ve made it myself I don’t understand it deeply. The more fundamental my understanding, the more flexible I can be in applying a concept. It’s just like calculus. There’s a world of difference between merely memorizing a formula and having a deep, intuitive understanding.

Read More

Reversing an Open Source Vulnerability

Vulnerability disclosures rarely include enough technical detail to reproduce the exploit. This is a good thing. It wouldn’t do to arm every script kiddie with exact details of how to write an exploit with every disclosure. However, there are times when someone like an application security engineer or security researcher need to “reverse engineer” the disclosure to reconstruct the technical detail in order to fully understand the vulnerability or write an exploit to test systems for weakness.

Read More

Code Kata: TDD and Run-length Encoding

A kata is a martial arts training method. It’s a set of detailed and choreographed movements and poses. The movements are performed repeatedly and internalized. A code kata is is a training method for developing skill in programming. Take something you do frequently, or wish to do better, strip away everything not essential, and practice it repeatedly.

Read More

TetCon 2016 - Android Deobfuscation: Tools and Techniques

I gave a talk at TetCon 2016 about Android obfuscation and deobfuscation.

The talks at TetCon were great and the people there were super nice. I got all kinds of new ideas and spent the entire flight home furiously coding. Super motivating to hear from and talk to other people working on similar problems. Thanks to the organizers and volunteers Thai for making everything happen.

Also, special thanks to everyone for speaking English around me!

Slides + Video




State of android deobfuscation is weak. Commercial obfuscators are getting more common, and reversers need to understand how to deobfuscate them. This talk provides an overview of different obfuscation types. After that, it describes two code deobfuscation approaches: pattern recognition and virtual execution.

Pattern recognition focuses mainly on identifying obfuscation patterns, crafting into regular expressions, and then repeatedly applying pattern-based transformations on the code. Insight into code behavior is improved by limited execution of certain methods and storing the result.

Virtual execution involves simulating the applications code to determine semantics. A context sensitive graph is generated representing every possible execution path and all possible register + class states for each execution of each instruction. This is then analyzed and modified to make the code easier to understand but behaves identically.

Decompiling XAPK Files

While reviewing new Android reverse engineering questions on Stack Overflow, I came across this request to decompile an .xapk. A brief, non-technical description of the format is described on APKPure’s website:

XAPK is a brand new file format standard for Android APK package file. Contains all APK package and obb cache asset file to keep Android games or apps working, it always ends in “.xapk”. To ensure games, applications run perfectly, APK Install one click install makes it easy for Android users directly install .apk, .xapk file to the root directory.
obb cache data?

Read More

How does Dalvik handle 'this' registers?

The this Reference

For every instance (virtual, non-static) method in Dalvik, the first parameter is a reference to itself, or, in Java, the this reference. I wanted to know if it was legal to reassign the register value.

Just so I’m sure you know what I’m talking about, here’s a simple Java class with an instance method called instanceMethod:

public class Instance {
private int number = 5;
public int instanceMethod() {
return this.number;

Read More

Why Anti-Virus Software Sucks

Everyone knows anti-virus products suck and you can say anti-virus sucks for many different reasons and at different levels. You could start with obvious, surface level reasons: anti-virus software (AV) sucks because it’s slow, klunky, self-advertising garbage that slows your machine down. From there, you could move on to more perceptive complaints such as how it hardly ever detects new malware and almost certainly will not detect fancypants, bespoke, advanced persistent threats (APT). You could still deeper and claim that there’s something wrong with an industry that thrives on selling people fear and selling companies mere compliance so their insurance doesn’t laugh in their faces when they try to collect after getting their gibson’s backdoor hacked.

The obvious question is then why do AV products suck? Malware is a big problem that costs people money and heartache all the time. Why isn’t this solved better? Need to understand the problem at the most fundamental level. For me, this means understanding the condition in terms of economics principals–incentives, constraints, market forces at work, and so on. Once you understand something at this level, you can usually extrapolate most of the symptoms yourself and, importantly, you’ll have a much better idea of how to actually fix it. This brings me to my main thesis: AV software sucks because it’s impossible for the market to be informed and to meaningfully differentiate between products and objectively determine which one is better. Because of this, there isn’t much incentive for companies to make lean, clean, optimized, AV products with amazing, complex detection capabilities and behavior analysis. They can’t compete on quality, because people can’t tell the difference between great and crap, so they have to compete on sales and advertising.


Read More