Monday, December 14, 2009

How to Learn What Code is Actually Disabling IRQs?

When one applies the PREEMPT_RT patch the spinlocks & mutexes get horribly overloaded. cli/sti use is frowned upon (x86) so one ends up with macros of macros of macros for locking so it is a head ache to see what functions are actually disbling interrupts.

Yet we are on x86 and the locking sequence generated by GCC is simple:
cli
...
sti
or could by a bit more complicated:
pushf
pop %reg
cli
...
push %reg
popf
Luckily I used objdump and a bit of massaging come to the rescue in the form of a neat script that finds all the object files that make up the compiled kernel and then searches the disassembler output for these sequences.

Bonus: the function names that do the locking are also printed.

Here comes findcli_x86.sh [run it at the root of the kernel tree after you compile the kernel]:
#!/bin/sh

# This file is licensed under GPLv2

# Note this works only with x86 code

obj_list='obj';
trap "rm -f $obj_list" 0 1 2 15

function extractcli()
{
file=$1;

[ ! -r "$file" ] && return 1;

[ -z "$OBJDUMP" ] && OBJDUMP=${CROSS_COMPILE}objdump

$OBJDUMP -dSCl $file | awk 'BEGIN {
idx="nosuchfunc"
}
/(^[a-zA-Z_]|cli[^a-z]|popf|sti[^a-z])/ && !/(file format|Disassembly)/ {
if($0 ~ /^[a-zA-Z_]/) {
idx=$1
} else {
F[idx]=(F[idx] "\n" "\t" $1 "\t" $NF);
}
}
END {
for(idx in F) {
print (" " idx F[idx])
}
}'
return 0
}

# main

find -type f -name '*.o' > $obj_list
for obj in $(cat $obj_list)
do
o=$(echo $obj | sed 's/^\.\///g')
[ "$o" = 'vmlinux.o' ] && continue; # this is the whole kernel
echo $o | grep -q 'built-in\.o' && continue; # these are aggregations

echo -en " \r$o" 1>&2
cnt=$(objdump -dS $o | grep -cw cli)
[ "$cnt" = '0' ] && continue;

echo -en " \r" 1>&2

src='???';
c=$(echo $o | sed 's/\.o$/\.c/g')
S=$(echo $o | sed 's/\.o$/\.S/g')
[ -f "$c" ] && src="$c"
[ -f "$S" ] && src="$S"

echo "$o: $cnt, src: $src"
extractcli $o
done
As usual awk comes to the rescue.

-ulianov

P.S. I looked for a decent decompiler for Linux other than objdump but I found only ancient ones and all broken. This is annoying as sometimes I need to see what external functions are called by an object file (.o).

Monday, December 7, 2009

DTS Headaches on a PPC440EPx Board

I had problems trying to get an ST RTC chip to be recognized by Linux. The chip was attached to the I2C bus 0 at address 0x68, the chip is supported by Linux but the two did not talk.

This is a custom board made specially for my US employer. The vendor brought up U-Boot, tested the peripherals with it and that was about it. I tried the denx.de kernel in a Sequoia configuration and no luck, I haggled the board vendor and they gave me a Linux kernel that booted on the board but did not detect much of the hardware.

By playing with the kernel config I got it to sniff the NOR flash and to partition it using a command-line scheme but the RTC chip was a mistery.

I dug a bit in the kernel source and docs and learned that the kernel expects from the bootloader a flattned device tree yet U-Boot does not provide that. The vendor provided a DTS file that has a textual representation of the devices but it did not get the RTC right.

By poking around I learned that the PPC kernel is matching text labels ("compatible") provided by the device drivers against the labels in the device tree. I hacked those a bit and the RTC works fine now.

-ulianov

Friday, December 4, 2009

Linux 2.6.31/PREEMPT_RT and Hard IRQs

I was trying to improve the latency of an ISR which serves an IRQ which fires every 100 uS. As a Softirq even when running at the highest priority the latency would some times slip to 1500 uS which is unacceptable (dd if=/dev/hda of=/dev/null comes to mind as a reason, hda is a slooow Compact Flash).

It does not help that the ISR is crunching numbers and doing float calculation in interrupt context. This ISR munches about 25 uS and should execute as close as possible to the beginning of the 100 uS interval.

So I tried to use a hard IRQ (IRQF_NODELAY) which was not good as the kernel would BUG_ON in rtmutex.c:807 when a process was trying to read from the character device which belonged to the driver which used the said IRQ.

I learned that spin_lock/spin_unlock are being overloaded by PREEMPT_RT by rt_mutex_lock/rt_mutex_unlock which do not seem to be compatible with hard IRQs: the mutex gets double-locked by the "current" process altho the two contenders are the hard-IRQ and code executed on behalf of a user process. I think this is bonkers.

I managed to get it working by using "atomic_" spinlock primitives which basically boil down [in disasm output] to cli/sti.

-ulianov

Thursday, December 3, 2009

A Nice Kernel Trick for x86

Today I learnt a nice way to obtain the number of microseconds since the power-up of the CPU using a P5 performance counter via (rdtsc). This instruction returns the number of ticks since power-up but it can be scaled to microseconds by dividing the number by the CPU frequency (in MHz).

The kernel code reads:
#include <linux/kernel.h>
#include <linux/cpufreq.h>

unsigned long long uSecSinceBoot()
{
volatile unsigned long long int cpu_ticks;

__asm__("rdtsc\n\t"
"mov %%edx, %%ecx\n\t"
:"=A" (cpu_ticks));

unsigned long long div = cpu_ticks * 1000;
do_div(div, cpu_khz); // TRICK! div is modified, do_div() returns reminder

return div;
}
I am using do_div() as on x86/32 bits long-long division is supported by GCC via an external function (_udivdi3) which usually lives in libgcc but it is not provided by the kernel.

-ulianov

P.S. I did extract _udivdi3 from libgcc and then disassembled/reassembled it but it's a pain. do_div is the right thing to use in the kernel.

Friday, September 18, 2009

The Strange Case of Multiplying Zombies

While working on a Linux/PPC embedded board I changed /etc/rc.sh (the system startup script as known by Busybox) to start our application in foreground.

This was good as we could see its output on the serial console and be able to interact with it. The application is a standalone binary which is interrogated by CGI and thru a Web interface. The web server is thttpd.

So I had everything running and I looked at the process table and noticed it being filled by zombie (Z) entries of the logger CGI (which gets invoked thrice a minute).

I tried to trace (strace) thttpd, I even put a waitpid(-1) at the top of its main loop [it's a single-threaded web server] and still could not get the damn zombies reaped!

This was baad as the system could stay up only for half a day before filling up the process table.

I did some hard thinking and remembered some APUE and Bach bits and concluded that Busybox [which alas contains init] must be still waiting for the termination of /etc/rc.sh before it starts the prescribed init behaviour!! I.e. Reaping orphaned processes and zombies.

So I put our application in background via nohup and voila! everything was good again.

-ulianov

Saturday, August 1, 2009

Mutex Problems on PPC, Again

One would think that doing
pthread_mutex_t mutex;
pthread_mutex_init(&mutex, NULL);
would yield a usable mutex.

Yet this is not always the case: on denx/PPC 440 strange things can happen (see the previous post When Classes Instantiated as auto Vars on Stack are Evil).

It turns out that the "correct" code sequence is
pthread_mutex_t mutex;
memset(&mutex, 0, sizeof(mutex)); // Voodoo for PPC
pthread_mutex_init(&mutex, NULL);
otherwise under some circumstances (e.g. mutexes ending up on the stack in the belly of an Object) one would have pthread_mutex_lock block on this mutex forever at the first invocation.

-ulianov

Tuesday, April 7, 2009

When Classes Instantiated as auto Vars on Stack are Evil

I have a threaded C++ app that's using message queue to pass data among threads like so (UML sequence diagram follows):
  Thread A (with response-Q) enqueues request in Thread B's input-Q
Thread A blocks on response-Q empty
Thread B wakes up [input-Q non-empty], dequeues request
Thread B munches on the request from A
Thread B enqueues result in A's response-Q
Thread B blocks on input-Q empty
Thread A wakes up [input-Q non-empty], dequeues response
Thread A goes its merry way.

Nice and easy, eh? And it has worked OK for a while...

You know how it is when one keeps adding code the bugs tend to be shifted on the shelf and rear their nasty heads? It happens in my case that Thread A was the main thread and its response-Q was declared as
Queue respQ("A's response Q");
Now this an auto var that lives on Thread A's stack.

Thread B was doing its job but when it wanted to enqueue the response in Thread A's respQ [it got a pointer to &respQ via a param] it would block in
pthread_mutex_lock()
in libc.

Bummer! I spent three hours writing LOCK/UNLOCK macros in class Queue that would confess who called them and in what thread and I was matching the results (yes, gdb was borked on that ppc target and was useless with inf thr et al.) and I really saw that the Queue instance of Thread A was indeed blocking in
pthread_mutex_lock()
but nobody had locked that mutex before!!

The funny part is that I had that mutex properly initialised in Queue::Queue(); I even changed its attribute to error-checking but it just did not help! That mutex would behave as if uninitialised and containing garbage!

After a while you get bored of this kind of debugging so I changed Thread A's code to read
Queue* respQ = new Queue("A's response Q");
and everything went smooth afterwards.

This yields the following article of faith:
Objects declared as class auto on stack are evil.
-ulianov

Thursday, April 2, 2009

When GCC is Not Smart Enough to Help

Have you noticed in the later versions of gcc that you have added lots of bogus warning-errors that suck the joy out of programming? Well, even with
-Wall -Werror
sometimes it won't help:
class X {
public:
typedef enum {
ERR1,
ERR2
} IOErrorCode;
const char* IOstrerror(IOErrorCode c); // string describing the error
};

struct S {
// ...
X::IOErrorCode status;
};
On another day and in another file I coded:
struct S* res = malloc(sizeof(struct S));
if(! DoSomeIO(res))
printf("IO/Error: %s (%d)\n", \
X::IOErrorCode(res->status), res->status);//(*)
and when I ran the program I would get a segfault at line (*) and GDB was indicating that the stack was partially smashed! Niice!

After scratching my head for half an hour it occured to me that I made a mistake: I coded
IOErrorCode(res->status) // BAAD
instead of
IOstrerror(res->status) // OK
The former is [in C++] a typecast to type IOErrorCode and will cause a crash inside printf().

The latter is a function call.

Ha! Not paying attention to my own code! And I had this sequence in five places handling I/O errors!

This is the most dangerous kind of error as one hits this code path infrequently (sometimes it only happens at a client's site thus driving the client mad).

-ulianov

Tuesday, March 31, 2009

When One Needs to Hide Things in Plain Sight on the WWW

Let's say that a web page needs to show content according to various local sensibilities and that it's hosted by a 3rd party that does not provide server-side scripting. What to do?

It's simple! Javascript+DOM+CSS come to the rescue: simply have a <div> with style="display:none". Store your content there in a plain/text scrambled form.

Then load a JavaScript script from a server that you control as the output of a server-side script.

Decide in your server-side script whether you wish to authorize the user based on the $REMOTE_ADDR to view the content and spit out either the decrypting JavaScript code or some dummy code.

Simple, eh?

The negative side-effect is that the search engines won't index your content. The positive side-effect is that the search engines won't index you content (maybe you don't like other people to use your content for their own purposes, e.g. keywordspy·com) and spammers won't be able to grab e-mail addresses from your pages ;)

-ulianov

Monday, March 30, 2009

When Networking != IPC

To some people it stands firmly to reason that
Networking != IPC
This boggles the mind as they see the only way for two (or more!) application [living on the same machine] to communicate is via SHM and semaphores (aka. "mailbox & lock"). They say it's faster.

It is -- Stevens states in UNP that such an approach is 30% faster than AF_UNIX sockets.

However there are some minor drawbacks in this communication pattern:
a. it cannot be extended across hosts (for sockets it's transparent, endianess non-withstanding)
b. it is only half-duplex (no better than the venerable pipes/FIFOs) so you need two such message queues;
c. there is no notification of a peer's death (TCP/IP sockets can send keep-alives and these can be tuned);
d. the notification of a pending message is wasteful and at best awkward: one needs to dedicate a thread on the receiving side to block on the semaphore; this can be mitigated with pthread_cond_timedwait but it cannot be mixed with select-ing on sockets and you'll end up with a thread babysitting the mailboxes;
e. if there is more than one receiver process then the receivers must hunt down the messages addressed to them in the mailboxes; worse if one of the receivers gets bored and ends execution its messages are cleaned up by no-one (can I smell a DoS?);
f. the data that can be transported via mailboxes is limited by the size of the mailboxes and one may have to resort to fragmentation -- things can get very hairy at this point;
g. this pattern is not observable, i.e. one cannot use tcpdump to look at the packets going to and fro; one must build custom tools to observe the data flow;
h. depending on the type of SHM used (e.g. SysV SHM) the mailboxes and their contents may persist after the death of all the parties involved; this can be good if one wanted that of very bad if the server process must clean up at start-up.

To me such a socket-phobia is unexplainable (that is thinking with a UN*X programmer's mind). I do recall tho the contortions and the horrendous API and sad performance of Winsock. Yet have I mentioned that this whole mailbox/lock brouhaha has to happen on Linux?

-ulianov

Saturday, March 28, 2009

A Nightmarish Fantasy

Please stand by...