Monday, December 14, 2009

How to Learn What Code is Actually Disabling IRQs?

When one applies the PREEMPT_RT patch the spinlocks & mutexes get horribly overloaded. cli/sti use is frowned upon (x86) so one ends up with macros of macros of macros for locking so it is a head ache to see what functions are actually disbling interrupts.

Yet we are on x86 and the locking sequence generated by GCC is simple:
or could by a bit more complicated:
pop %reg
push %reg
Luckily I used objdump and a bit of massaging come to the rescue in the form of a neat script that finds all the object files that make up the compiled kernel and then searches the disassembler output for these sequences.

Bonus: the function names that do the locking are also printed.

Here comes [run it at the root of the kernel tree after you compile the kernel]:

# This file is licensed under GPLv2

# Note this works only with x86 code

trap "rm -f $obj_list" 0 1 2 15

function extractcli()

[ ! -r "$file" ] && return 1;

[ -z "$OBJDUMP" ] && OBJDUMP=${CROSS_COMPILE}objdump

$OBJDUMP -dSCl $file | awk 'BEGIN {
/(^[a-zA-Z_]|cli[^a-z]|popf|sti[^a-z])/ && !/(file format|Disassembly)/ {
if($0 ~ /^[a-zA-Z_]/) {
} else {
F[idx]=(F[idx] "\n" "\t" $1 "\t" $NF);
for(idx in F) {
print (" " idx F[idx])
return 0

# main

find -type f -name '*.o' > $obj_list
for obj in $(cat $obj_list)
o=$(echo $obj | sed 's/^\.\///g')
[ "$o" = 'vmlinux.o' ] && continue; # this is the whole kernel
echo $o | grep -q 'built-in\.o' && continue; # these are aggregations

echo -en " \r$o" 1>&2
cnt=$(objdump -dS $o | grep -cw cli)
[ "$cnt" = '0' ] && continue;

echo -en " \r" 1>&2

c=$(echo $o | sed 's/\.o$/\.c/g')
S=$(echo $o | sed 's/\.o$/\.S/g')
[ -f "$c" ] && src="$c"
[ -f "$S" ] && src="$S"

echo "$o: $cnt, src: $src"
extractcli $o
As usual awk comes to the rescue.


P.S. I looked for a decent decompiler for Linux other than objdump but I found only ancient ones and all broken. This is annoying as sometimes I need to see what external functions are called by an object file (.o).

Monday, December 7, 2009

DTS Headaches on a PPC440EPx Board

I had problems trying to get an ST RTC chip to be recognized by Linux. The chip was attached to the I2C bus 0 at address 0x68, the chip is supported by Linux but the two did not talk.

This is a custom board made specially for my US employer. The vendor brought up U-Boot, tested the peripherals with it and that was about it. I tried the kernel in a Sequoia configuration and no luck, I haggled the board vendor and they gave me a Linux kernel that booted on the board but did not detect much of the hardware.

By playing with the kernel config I got it to sniff the NOR flash and to partition it using a command-line scheme but the RTC chip was a mistery.

I dug a bit in the kernel source and docs and learned that the kernel expects from the bootloader a flattned device tree yet U-Boot does not provide that. The vendor provided a DTS file that has a textual representation of the devices but it did not get the RTC right.

By poking around I learned that the PPC kernel is matching text labels ("compatible") provided by the device drivers against the labels in the device tree. I hacked those a bit and the RTC works fine now.


Friday, December 4, 2009

Linux 2.6.31/PREEMPT_RT and Hard IRQs

I was trying to improve the latency of an ISR which serves an IRQ which fires every 100 uS. As a Softirq even when running at the highest priority the latency would some times slip to 1500 uS which is unacceptable (dd if=/dev/hda of=/dev/null comes to mind as a reason, hda is a slooow Compact Flash).

It does not help that the ISR is crunching numbers and doing float calculation in interrupt context. This ISR munches about 25 uS and should execute as close as possible to the beginning of the 100 uS interval.

So I tried to use a hard IRQ (IRQF_NODELAY) which was not good as the kernel would BUG_ON in rtmutex.c:807 when a process was trying to read from the character device which belonged to the driver which used the said IRQ.

I learned that spin_lock/spin_unlock are being overloaded by PREEMPT_RT by rt_mutex_lock/rt_mutex_unlock which do not seem to be compatible with hard IRQs: the mutex gets double-locked by the "current" process altho the two contenders are the hard-IRQ and code executed on behalf of a user process. I think this is bonkers.

I managed to get it working by using "atomic_" spinlock primitives which basically boil down [in disasm output] to cli/sti.


Thursday, December 3, 2009

A Nice Kernel Trick for x86

Today I learnt a nice way to obtain the number of microseconds since the power-up of the CPU using a P5 performance counter via (rdtsc). This instruction returns the number of ticks since power-up but it can be scaled to microseconds by dividing the number by the CPU frequency (in MHz).

The kernel code reads:
#include <linux/kernel.h>
#include <linux/cpufreq.h>

unsigned long long uSecSinceBoot()
volatile unsigned long long int cpu_ticks;

"mov %%edx, %%ecx\n\t"
:"=A" (cpu_ticks));

unsigned long long div = cpu_ticks * 1000;
do_div(div, cpu_khz); // TRICK! div is modified, do_div() returns reminder

return div;
I am using do_div() as on x86/32 bits long-long division is supported by GCC via an external function (_udivdi3) which usually lives in libgcc but it is not provided by the kernel.


P.S. I did extract _udivdi3 from libgcc and then disassembled/reassembled it but it's a pain. do_div is the right thing to use in the kernel.