Tuesday, June 3, 2008

Thrashing the Kernel (Linux 2.4.x)

Today a co-worker told me that a library I've been writing is causing a Segfault but he could not give me an IP. Hmm. The library calls a custom driver and tries to pull data from the driver via ioctl's.

Then I used the lib in a utility and presto! I hosed the PPC 450. Funny is that the utility executes picture-perfect, does not crash, the kernel does not panic(), but everything executed after that segfaults.

Now that's artful.

Jun 4 update: I traced the code by #if 0'ing code and the problem is in the kernel driver.

Jun 10 update: The root cause was that they changed the way they were DMA-ing data into my user buffer that got mapped into the kernel space. As I was malloc-ing the buffer and not touching it the associated hardware pages were not actually mapped and that was confusing the PPC DMA engine.

memset-ing the buffer to zero in user space to fault the pages in or forcing the mapping of pages in kernel mode fixed the problem.

-ulianov