I was trying to improve the latency of an ISR which serves an IRQ which fires every 100 uS. As a Softirq even when running at the highest priority the latency would some times slip to 1500 uS which is unacceptable (dd if=/dev/hda of=/dev/null comes to mind as a reason, hda is a slooow Compact Flash).
It does not help that the ISR is crunching numbers and doing float calculation in interrupt context. This ISR munches about 25 uS and should execute as close as possible to the beginning of the 100 uS interval.
So I tried to use a hard IRQ (IRQF_NODELAY) which was not good as the kernel would BUG_ON in rtmutex.c:807 when a process was trying to read from the character device which belonged to the driver which used the said IRQ.
I learned that spin_lock/spin_unlock are being overloaded by PREEMPT_RT by rt_mutex_lock/rt_mutex_unlock which do not seem to be compatible with hard IRQs: the mutex gets double-locked by the "current" process altho the two contenders are the hard-IRQ and code executed on behalf of a user process. I think this is bonkers.
I managed to get it working by using "atomic_" spinlock primitives which basically boil down [in disasm output] to cli/sti.