Tuesday, April 7, 2009

When Classes Instantiated as auto Vars on Stack are Evil

I have a threaded C++ app that's using message queue to pass data among threads like so (UML sequence diagram follows):
  Thread A (with response-Q) enqueues request in Thread B's input-Q
Thread A blocks on response-Q empty
Thread B wakes up [input-Q non-empty], dequeues request
Thread B munches on the request from A
Thread B enqueues result in A's response-Q
Thread B blocks on input-Q empty
Thread A wakes up [input-Q non-empty], dequeues response
Thread A goes its merry way.

Nice and easy, eh? And it has worked OK for a while...

You know how it is when one keeps adding code the bugs tend to be shifted on the shelf and rear their nasty heads? It happens in my case that Thread A was the main thread and its response-Q was declared as
Queue respQ("A's response Q");
Now this an auto var that lives on Thread A's stack.

Thread B was doing its job but when it wanted to enqueue the response in Thread A's respQ [it got a pointer to &respQ via a param] it would block in
pthread_mutex_lock()
in libc.

Bummer! I spent three hours writing LOCK/UNLOCK macros in class Queue that would confess who called them and in what thread and I was matching the results (yes, gdb was borked on that ppc target and was useless with inf thr et al.) and I really saw that the Queue instance of Thread A was indeed blocking in
pthread_mutex_lock()
but nobody had locked that mutex before!!

The funny part is that I had that mutex properly initialised in Queue::Queue(); I even changed its attribute to error-checking but it just did not help! That mutex would behave as if uninitialised and containing garbage!

After a while you get bored of this kind of debugging so I changed Thread A's code to read
Queue* respQ = new Queue("A's response Q");
and everything went smooth afterwards.

This yields the following article of faith:
Objects declared as class auto on stack are evil.
-ulianov

Thursday, April 2, 2009

When GCC is Not Smart Enough to Help

Have you noticed in the later versions of gcc that you have added lots of bogus warning-errors that suck the joy out of programming? Well, even with
-Wall -Werror
sometimes it won't help:
class X {
public:
typedef enum {
ERR1,
ERR2
} IOErrorCode;
const char* IOstrerror(IOErrorCode c); // string describing the error
};

struct S {
// ...
X::IOErrorCode status;
};
On another day and in another file I coded:
struct S* res = malloc(sizeof(struct S));
if(! DoSomeIO(res))
printf("IO/Error: %s (%d)\n", \
X::IOErrorCode(res->status), res->status);//(*)
and when I ran the program I would get a segfault at line (*) and GDB was indicating that the stack was partially smashed! Niice!

After scratching my head for half an hour it occured to me that I made a mistake: I coded
IOErrorCode(res->status) // BAAD
instead of
IOstrerror(res->status) // OK
The former is [in C++] a typecast to type IOErrorCode and will cause a crash inside printf().

The latter is a function call.

Ha! Not paying attention to my own code! And I had this sequence in five places handling I/O errors!

This is the most dangerous kind of error as one hits this code path infrequently (sometimes it only happens at a client's site thus driving the client mad).

-ulianov