Segfaults are Cool: On using UPX on Static linux-i386 Binaries

On an embedded target I work binaries on are stored on a CF card which has about 1.8M/s read speed. However the static binary in question is about 40M uncompressed or 8M gzip'ed.

As in the old Stacker days the box can decompress faster than it reads from CF so a compromise has been reached: store the binary gzip'ed, decompress to /tmp (a ramdisk) and run from there.

This embedded system does not have swap enabled but the kernel in low mem situations uses demand paging for the r/o pages in the .text area of a binary. I.e. it steal LRU code pages from in-core knowing they will be found in the on-disk binary. This is why one gets a Text file busy error when one tries to alter a binary which is running.

In our case we end up with basically two copies of the binary in-core (this a GCJ-compiled Java app so the .text is fairly substantial tho 90% of it is junk).

Then I took the upx -9 route and the results are quite interesting. The compressed binary shrank to 7M which means faster load time and a smaller software installer.

Here is some sample C code:

int main()
{
  char c = 0;
  printf("Press ENTER:"); fflush(stdout);
  read(0, &c, 1);
  return 0;
}

The static stripped binary is 377,204 bytes and the upx'ed static binary is 174,880 bytes. size(1) reports for a.out:

   text    data     bss     dec     hex filename
 371111    3144    4448  378703   5c74f a.out

Running and suspending the binaries we get:

	a.out	a.upx
VmSize	516 kB	524 kB
VmLck	0 kB	0 kB
VmRSS	124 kB	96 kB
VmData	140 kB	508 kB
VmStk	8 kB	12 kB
VmExe	364 kB	4 kB
VmLib	0 kB	0 kB

So upx moves the code from .text to the data of the running binary. Demand paging bye-bye but at least we don't (theoretically) keep two copies of .text in core.

In practice Linux cheats and does not fault in all the pages of the binary when loading it... it loads enough to make it start and it's lazy about the rest... if the binary needs those pages they will be faulted in later.

Or this is a bed-time story for bearded UN*X hackers.

-ulianov

Thursday, November 24, 2011

On using UPX on Static linux-i386 Binaries