As in the old Stacker days the box can decompress faster than it reads from CF so a compromise has been reached: store the binary gzip'ed, decompress to /tmp (a ramdisk) and run from there.
This embedded system does not have swap enabled but the kernel in low mem situations uses demand paging for the r/o pages in the .text area of a binary. I.e. it steal LRU code pages from in-core knowing they will be found in the on-disk binary. This is why one gets a Text file busy error when one tries to alter a binary which is running.
In our case we end up with basically two copies of the binary in-core (this a GCJ-compiled Java app so the .text is fairly substantial tho 90% of it is junk).
Then I took the upx -9 route and the results are quite interesting. The compressed binary shrank to 7M which means faster load time and a smaller software installer.
Here is some sample C code:
The static stripped binary is 377,204 bytes and the upx'ed static binary is 174,880 bytes. size(1) reports for a.out:int main()
char c = 0;
printf("Press ENTER:"); fflush(stdout);
read(0, &c, 1);
Running and suspending the binaries we get:text data bss dec hex filename
371111 3144 4448 378703 5c74f a.out
|VmSize||516 kB||524 kB|
|VmLck||0 kB||0 kB|
|VmRSS||124 kB||96 kB|
|VmData||140 kB||508 kB|
|VmStk||8 kB||12 kB|
|VmExe||364 kB||4 kB|
|VmLib||0 kB||0 kB|
So upx moves the code from .text to the data of the running binary. Demand paging bye-bye but at least we don't (theoretically) keep two copies of .text in core.
In practice Linux cheats and does not fault in all the pages of the binary when loading it... it loads enough to make it start and it's lazy about the rest... if the binary needs those pages they will be faulted in later.
Or this is a bed-time story for bearded UN*X hackers.