background: 10K+ compiled class files and sources that got out of sync1; need to figure out which sources are valid, and which ones are not.
decompiling things is the last resort,
since sources produced are not easily diff
‘able against the sources
you’ve got. the likes of
diffj is not much help
either, and i did not even want to go down the rabbit hole of
normalization through obfuscators.
so if you do not feel like wielding antlr or javacc to normalize two sources, the obvious approach is to simply recompile and compare with the existing class files (just beware of missing class files that might not have any sources at all).
however, keep in mind that javac
by default includes line number table
in the class file it produces2. this means that even
if you added or removed a line of comments or even a blank line before
any sources, it would result in a classfile that is different from the
original.
sometimes you have another class inside the .java file (not to be confused with inner classes). in this case it gets compiled into a separate class file. so if your main class’ source code has changed, it will affect the line number table of the other class as well. this means that even though another class’ source has not changed, its generated class file will be different.
in my case i also had to check which jdk compiler produced the class
files. one can always opt for javap
that prints minor and major
versions, or if you are feeling manly enough, whip out your favorite
hex editor and check bytes 6 and 7
(according to the vm
spec).
in general, javap
is the easiest way to check the internal structure
of the class
file.
finally, to diff files and directories i simply used svn – check the originals into the local repo, then put your stuff in a working copy – it will do the rest. after all, this is what it’s good at.
oh and why CAFE BABE
? look it up
this is a whole different and interesting topic – how any sort of generated content creates a possibility of this disconnect. all these xdoclets, jaxb-generated sources, and even compiled classfiles create artifacts that now have a potential of getting out of sync with the sources. yes, proper engineering practices mitigate the risks, but all things being equal, i like the fact that with scripting languages this problem is largely non-existent. what you see is what you run. ↩
you could always run javac
with -g:none
to get rid of
line numbers, but it was no help in my case. ↩