Recently, I needed to write a bunch of Smali code to use in tests for Simplify. While, Smali syntax is simple and fairly easy to write, it’s also tedious and I needed to do some tricky, uncommon stuff. I wasn’t even sure how to do it in Smali. Luckily, it’s pretty easy to write Java and convert it to Smali. I’ve talked about how to make a small alias to do this and go over some other use cases in a previous post. Writing Java and converting to Smali makes it easy to quickly prototype lots of Smali code without worrying about Smali syntax or conventions. In this post, I want to show how to use a new Android compiler called
jack which takes the place of
dx and you’ll need to know how to use if you want to continue converting Java to Smali.
The original Android compiler is
dx and it works by translating Java .class files to Dalvik executables (.dex). Jack, however, compiles Java source code, so you don’t need to invoke
javac at all.
I originally had to learn about Jack because it looks like
dx doesn’t support newer Java .class versions. I tried converting a Java 8 compiled class and
dx gave me the following error:
This was a problem. I was trying to fix a bug which exposed another bug which exposed 4 or 5 things I could be doing much cleaner which in turn led to even more stuff I wanted to fix first before fixing the original bugs. It’s a bit like this:
By the time I saw the
dx error, I had about blown my stack. After a few milliseconds panic, I calmed down and started looking through the
build-tools directory of the most recent Android platform I had installed. I knew that Android must be able to convert Java 8 classes to .dex because people are making Android apps using it.
Lo and behold, there’s this
jack.jar. I ran it just to see what it did and it gave this nice, long, helpful help message:
The main difference between
dx is that, since Jack operates on Java source, it needs access to some Java runtime files. I assumed
rt.jar would be enough. Here’s an example of how to use it with a file called
This is tweaked to work on a Mac, but it should be easy to translate to Linux or Windows.
I was curious if it was possible to fingerprint Jack-built .dex files. There’s a lot of useful stuff you can do with compiler fingerprinting such as detecting malware. I wanted to add the fingerprints to the database in APKiD. To find how the files are different, I built two .dex files from the same Java but using different tools. In this case,
Here’s a small .dex file created with
Here’s the same Java converted with
See that cute little
emitter: jack-4.12? Looks like Jack intentionally watermarks files it creates. It might be able to turn it off with a command line parameter, but I haven’t looked. Here are the rules I added to APKiD to detect Jack: https://github.com/rednaga/APKiD/commit/ccca5ed519b7b2551a3205686be364c26020f1cd#diff-1731ed362177d8429d827a2b6ef3786bR131.
Shortly after announcing this blog on Twitter, @iBotPeaches (Apktool developer) was kind enough to point out another distinguishing characteristic of Jack-generated DEX files which is described here: https://github.com/iBotPeaches/Apktool/issues/1354.
The new Jack compiler changes the names of compiler generated
access$000 methods to names like
-wrap0(). To build some .dex files for testing, here’s some Java code:
I saved this as
OuterClass.java and compiled it explicitly with Java 1.7 because 1.8 isn’t supported by
Then, I used
javap to get the names of the
javac generated method names:
Yup, it’s all
access$00\d stuff, so Jack must be the one changing these. No idea why it does this apart from making it a bit more clear what their function is, i.e. it’s easier to guess the behavior of
To get a Jack compiled version of
Now, to examine the differences in method names by looking at the strings:
Ok, seems obvious enough. I could look for these two sets of strings to figure out the compiler. But what happens if you use
dexmerge to combine the
jack.jar produced .dex files?
Here’s the alias I use for
In case you’re curious, here’s the usage:
My guess was that this would merge all the strings but the only the code in the first .dex would be kept, leaving several strings unreferenced. Unreferenced strings might actually be an interesting heuristic for finding weird .dex files, but it’s not something you could do without some disassembly.
Lo, and behold, the strings from each are retained:
This .dex has an interesting history. If you know what to look for, you could tell quite a bit about how it was made which may help you infer how technically sophisticated the creator was and what tools and environment they were using. You’d know straight away that it’s the result of
dexmerge because of the map type ordering (search
ABNORMAL_TYPE_ORDER (note to self: number slides in the future)). You would also know parts of the file were created with
dx or dexlib 2.x and parts were created with