Thursday, June 2, 2016

A (Software) Bug's Life

A local recruiter pinged me today with:
Specifically, we need someone very skilled working through a Linux OS and programming using C Language. This is for a 4 to 6 week assignment with a medical device company in Oakville, ON. They’re currently at the integration stage of a project, but have a section of code causing issues so they would like an expert that can come in, analyze the code and recommend/implement changes to be made.
Which got me reflecting on bug classes:
  • Segfaults/memory leaks: These (as the blog title suggests) are relatively* easy to debug as the code "speaks" about its defectiveness: it crashes, Valgrind flags a bunch of problems, ElectricFence does some brutal boundary checking. Coverity (or Klokwork) will catch some problems.
  • Logic errors: The code does not do what it's designed to but in a corner-case sneaky kind of way. A good QA department [which many times shines by its non-existence] will catch most of them. Sometimes the customer screaming on the phone focuses the mind.
  • Architectural flaws: The code is unsuited for the problem at hand either because it's old and wasn't designed for the current purpose or because the Architect was an imbecile [it happens]. These are the worst as they require the whole circus of refactoring/rearchitecting and lots of NRE for testing.
-ulianov

* = After 16 years of doing that :(

Tuesday, January 21, 2014

A Perl Tidbit (aka Evil P3rl R Us)

Say you have Perl script A which is deployed in multiple sites, does its job well and thus should not be changed/refactored. In my case it takes a phone number on the command line and decides whether it's from Canada. The verdict is communicated via the exit code.

At one site one needs to write script B which does what A does but has a different interface incompatible with the one of A (in my case it must be an Asterisk AGI script).

Refactor? Nah, that's for wimps. Call A from B as a subshell? Suboptimal, for wimps only.

We need to go old skool and do. Never heard of perl-do? Not-for-the-faint-hearted.

Bad news: A calls exit(1) in multiple places. That bombs B as well. Two things to do: use $SIG{__DIE__} [doesn't work satisfactorily] or overload exit:

#!/usr/bin/perl

use strict;
use warnings;

use Asterisk::AGI;

push @INC => '/root'; # A.pl needs this :(

my $AGI = new Asterisk::AGI;

my %input = $AGI->ReadParse();

$ARGV[0] = $input{callerid}; # fake passing args using command line

my $canada;
eval {
   no warnings;
   local *CORE::GLOBAL::exit = sub {
     my $excode = shift;
     $AGI->noop("Exit code was $excode");
     $canada = !int($excode);
     goto out; # A.pl calls the overloaded exit twice w/o this (!!)
   };

   -r '/root/A.pl' or $AGI->noop("Cannot read A.pl: $!");
   $AGI->noop("do() FAILED $@ - $!") unless do '/root/A.pl';
};
out:
$AGI->set_variable('RESULT', $canada? 'YES': 'NO');

0;
-ulianov

Sunday, January 5, 2014

How to Find the Process ID of this instance of CMD.EXE

Say you are writing a CMD.EXE script and would like to know the pid of the shell. Easy? No. CMD.EXE does not provide a $PID as Bash does so it takes a bit of contorting to get there (and you will mess up the window title):
@echo off

set r=%RANDOM%
title "SUPERSECRETTITLE %r%"
for /f "usebackq tokens=2" %%a in (`" \
  tasklist /v /fi "imagename eq cmd.exe"| \
  %windir%\System32\find "SUPERSECRETTITLE %r%" \
  "`) do set PID=%%a

echo PID is %PID%
Note: Blogspot has a funny way to place text into columns. I broke the "for" command line into multiple lines using "\" -- you need to make it back a superlong command line for this to work.

Bonus (continuation): Find out if this CMD.exe was started with "/K" so you know to do "exit 0" or "exit /b 0":
wmic process where (name="cmd.exe" and processid=%PID%)\
  get commandline /value | \
  %windir%\system32\find " /K " >nul
set SLASHK=%ERRORLEVEL%
The point of all this is that one does not need PowerShell to do useful Win32 scripting.

-ulianov

Friday, October 25, 2013

NULL Pointer in __up() in Custom Driver

My client calls and says he has a panic; I request stack traces and one comes to me as cute as lemon pie (see below).

This is Linux 2.6.29 with RTAI and has been running well for three years in production systems.

So the bug cannot be in __up(). I disass the crash point and it looked like
*(%ecx + 4) := %eax
This looks like "NULL->offset4". The kernel code looked like "waiter_list->prev". Hmm.

This must be something sprinkling memory. I reviewed custommodule and indeed some debug code looked like
array[index++] = debug_info;
No checking on array bounds, see?

Customer reworked the code and lived happy until the next crash ;)

-ulianov

PS. There were other crashes caused by the same problem from other power cycles but none as beautiful and explicit as this.
BUG: unable to handle kernel NULL pointer dereference at 00000004
IP: [<c0286e8a>] __up+0xb/0x2e
*pde = 365c1067 *pte = 00000000 
Oops: 0002 [#1] 
Modules linked in: custommodule(P) module3x20(P) moduleDSPcode(P)\
                   rdtsc customdebug coretemp fakertnet(P) e1000e \
                   irqregistrar(P) \
                   rtai_smi rtai_mbx rtai_sched \
                   rtai_math rtai_hal uhci_hcd

Pid: 1873, comm: customproc.bin Tainted: P (2.6.29.6-kernel8-ipipe #54)  
EIP: 0060:[<c0286e8a>] EFLAGS: 00010007 CPU: 0
EIP is at __up+0xb/0x2e
EAX: 73694c67 EBX: 00000200 ECX: 00000000 EDX: 00000000
ESI: f65cdbe0 EDI: f9feed50 EBP: f65cdb14 ESP: f65cdb14
 DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
Process mts5000.bin (pid: 1873, ti=f65cc000 task=f70eacc0 task.ti=f65cc000)
I-pipe domain Linux
Stack:
 f65cdb20 c012627a f9f286d4 f65cdbec f86a32c4 00004e1f 00004e20 f9e72441
 00000013 00000002 000c0f19 00000001 f86b819b 20203130 65532f3c 6e697474
 73694c67 00003e74 00000000 00000000 00000000 00000000 00000000 00000000
Call Trace:
 [<c012627a>] ? up+0x2e/0x44
 [<f86a32c4>] ? dequeueCommsRequest+0x217/0x225 [custommodule]
 [<f86a0382>] ? customdriver_read+0x15e2/0x1cdb [custommodule]
 [<c01331b8>] ? __ipipe_restore_root+0x16/0x18
 [<c01331b8>] ? __ipipe_restore_root+0x16/0x18
 [<c0131e6e>] ? cpu_quiet+0x71/0xcb
 [<c0118ff1>] ? __do_softirq+0xc5/0xcd
 [<c0119110>] ? irq_exit+0x28/0x2a
 [<c0104285>] ? do_IRQ+0x55/0x68
 [<f86b0024>] ? pfc_runInInterrupt+0xe0/0x6cf [custommodule]
 [<f86ae1a4>] ? sampleInterruptHandler+0x2944/0x2958 [custommodule]
 [<f86ae1a4>] ? sampleInterruptHandler+0x2944/0x2958 [custommodule]
 [<f86b0024>] ? pfc_runInInterrupt+0xe0/0x6cf [custommodule]
 [<f86ae1a4>] ? sampleInterruptHandler+0x2944/0x2958 [custommodule]
 [<c011222a>] ? enqueue_task_fair+0x12b/0x133
 [<c0110df5>] ? check_preempt_wakeup+0x82/0xa5
 [<c0112922>] ? try_to_wake_up+0xa2/0xad
 [<c0112944>] ? wake_up_state+0xa/0xc
 [<c011d59f>] ? signal_wake_up+0x51/0x55
 [<c011d717>] ? complete_signal+0x174/0x18c
 [<c011d8b1>] ? send_signal+0x182/0x197
 [<c01331b8>] ? __ipipe_restore_root+0x16/0x18
 [<c011df89>] ? group_send_sig_info+0x54/0x5d
 [<c011dfbd>] ? kill_pid_info+0x2b/0x35
 [<c011e129>] ? sys_kill+0x6f/0x114
 [<f869eda0>] ? customdriver_read+0x0/0x1cdb [custommodule]
 [<c014ffbe>] ? vfs_read+0x87/0x101
 [<c01500d1>] ? sys_read+0x3b/0x60
 [<c0102c07>] ? syscall_call+0x7/0xb
EIP: [<c0286e8a>] __up+0xb/0x2e SS:ESP 0068:f65cdb14
---[ end trace 2aa77bbc7c743932 ]---

Thursday, August 1, 2013

What to do about a dead-slow disk I/O VPS?

My new VPS provider uses OpenVZ and the disk I/O is even slower that the one of my previous hoster.

I wrote an Asterisk AGI application which compiles weather info from various sources. It is Perl (so it uses many piddly modules/shared objects on-disk) and instructs Asterisk to stream ~100 gsm/sln audio files.

When using the naked VPS this is slow as each of these files needs to be loaded in-memory.

So what to do? Wire them into the Linux file cache of course! I compiled a list of all non-system shared objects [e.g. not libc/libm/libdl] my Perl apps/CGI scripts use, all the sound files I need and I am mmap/mlock-ing them in core using the utility below.

The added bonus is that my external bit of code that does the Gōōgle TTS now only takes only 14s to complete instead of 54s which in audio terms is almost instantaneous.

I am not quite sure if I have to keep the fds open... have to experiment.

-ulianov
// This code is licensed under GPLv2
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>

#ifndef O_LARGEFILE
  #define O_LARGEFILE  0100000
#endif

typedef struct {
  char name[513];
  int fd;
  size_t size;
  const void* mmaploc;
} mapinfo_t;

char* name = "mmaplock";

size_t mmap_mlock(const char* name, mapinfo_t* info)
{
   if(name == NULL || name[0] == '\0') return (size_t)-1;
   if(info == NULL) return (size_t)-1;

   const fd = open(name, O_RDONLY|O_LARGEFILE);
   if(fd < 0) {
      fprintf(stderr, "%s: Cannot open %s for reading: %s\n",
              __func__, name, strerror(errno));
      return (size_t)-1;
   }

   struct stat st;
   fstat(fd, &st);

   void* memblock = mmap(NULL, st.st_size, PROT_READ, MAP_SHARED, fd, 0);
   if (memblock == MAP_FAILED) {
      fprintf(stderr, "%s: Cannot mmap %s for reading: %s\n",
              __func__, name, strerror(errno));
      close(fd);
      return (size_t)-1;
   }
   if(mlock(memblock, st.st_size) == -1) {
      fprintf(stderr, "%s: Cannot mlock %s (size %d): %s\n",
              __func__, name, st.st_size, strerror(errno));
      munmap(memblock, st.st_size);
      close(fd);
      return (size_t)-1;
   }

   strncpy(info->name, name, 512); info->name[0] = '\0';
   info->fd = fd;
   info->size = st.st_size;
   info->mmaploc = memblock;

   return st.st_size;
}

mapinfo_t info[1021]; // this many fd's available by default

#define MAX_FILES (sizeof(info)/ sizeof(info[0]))

void usage()
{
   fprintf(stderr, "usage: %s \n", name);
   exit(0);
}

int main(int argc, char* argv[])
{
   if(argc < 1) usage();

   const char* listName = argv[1];

   FILE* flist = fopen(listName, "r");
   if(flist == NULL) {
      fprintf(stderr, "%s: Cannot open %s for reading: %s\n",
              name, listName, strerror(errno));
      return 1;
   }

   close(0); // save one fd
   int count = 0;
   while(! feof(flist)) {
      char buf[513] = {0};
      fgets(buf, 512, flist);
      if(buf[0] == '\0') break;

      if(buf[0] == '#') continue;

      int N = strlen(buf);
      if(buf[N-1] == '\r' || buf[N-1] == '\n') buf[--N] = '\0';
      if(buf[N-1] == '\r' || buf[N-1] == '\n') buf[--N] = '\0';

      mmap_mlock(buf, &info[count++]);

      if(count >= MAX_FILES) break;
   }
   fclose(flist);

   for(;;)
      sleep(60); // ping mmaped files??`

   return 0;
}

Monday, October 15, 2012

SH-2 Aquarius Kernel Debugging Done in Unusual Ways

I had problems and lockups (mostly caused by me) while developing device drivers for Linux kernel 3.4.

However this needed debugging and sometimes psychic debugging and Jedi mind tricks do not suffice. SH-2 has no protection between user/kernel mode code, memory access; the only thing it cares about is unaligned access to memory.

JTAG was not implemented in RTL on this hacked instance of the CPU core (which ran in a Spartan-6 FPGA). GDB/serial was working fine but only for the boot loader as the kernel took over the interrupt vectors used by GDB. The core ran at 25 MHz which is a bit too fast for printf debugging.

So I did the reasonable thing: I applied the thumb screws to the powers-that-be and the RTL engineer that was supporting me to give me hacked up serial port:
a) on the SH-2 side it looked like a memory-mapped device (8 bit) with an enable/disable bit -- a value put there by the SH-2 instruction set would be emitted continuously as serial output at 9600 baud -- at no cost/overhead for the CPU;
b) on the RS-232 side it looked as a transmit-only device (TX, GND) using only 1 I/O line of the FPGA.

The RTL code had 2 complete instances of RS-232 [umm, no flow control tho] already so it was just a matter of adding another instance and hacking it.

So debugging boiled down to checkpointing the code: writing some ASCII character in one place of the code and one in another, plus enabling/disabling the port as needed. I would monitor the output with Minicom and have an idea of what the kernel was doing and roughly for how long.

Yes, barbaric but one must make do with what he can scrounge.

-ulianov

Wednesday, March 21, 2012

Debugging a Hard Lockup with RTAI

I have an embedded box running Linux 2.6.29/RTAI/RTnet. I hacked the e1000e driver for use with RTnet and also I eviscerated its ISR as (due to PCI limitations) its IRQ sits on the same line as an FGPA IRQ my client is using.

The FPGA RTAI ISR is calling the old e1000e ISR by
a) signalling a semaphore which
b) wakes an RTAI tasks which
c) calls the e1000e ISR.

Some times when using this hacked driver the machine would lock up hard and we had no clue.

I found out about the "nmi_watchdog=1" kernel option, recompiled the kernel with IO-APIC (fortunately this target has an APIC) and observed. After 5 seconds the answer came on a silver platter:

BUG: NMI Watchdog detected LOCKUP on CPU0, ip f8a4d620
Pid: 990, comm: sh Tainted: P (2.6.29.6-apic-ipipe #37)
EIP: 0060:[<f8a4d620>] EFLAGS: 00000046 CPU: 0
EIP is at rt_sem_signal+0xf4/0x3d6 [rtai_sched]
EAX: 3fffffff EBX: 00000010 ECX: f8a5aae0 EDX: 3fffffff
ESI: f8a640e0 EDI: f8a63c60 EBP: f7b5be44 ESP: f7b5be30
DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
Process sh (pid: 990, ti=f7b5a000 task=f78f2330 task.ti=f7b5a000)
I-pipe domain Linux
Stack:
00000246 00000000 00000010 00000000 00000003 f7b5be58 f8a630d5 f8a640e0
00000011 fa63e4c4 f7b5be70 f8e9b694 000001b8 c0297874 00000000 f7819580
f7b5be88 c01dc543 f74e0748 00000000 f781de40 00000000 f7b5bea8 c01537d1
Call Trace:
[<f8a630d5>] ? irqregistrar_switch_irq+0x59/0x61 [irqregistrar]
[<f8e9b694>] ? proprietarydriver_open+0x3b0/0x3ce [proprietarymodule]
[<c01dc543>] ? misc_open+0xf4/0x150
[<c01537d1>] ? chrdev_open+0xef/0x106
[<c015045c>] ? __dentry_open+0xfe/0x1eb
[<c01505e2>] ? nameidata_to_filp+0x2b/0x42
[<c01536e2>] ? chrdev_open+0x0/0x106
[<c015a146>] ? do_filp_open+0x337/0x648
[<c013595b>] ? __ipipe_restore_root+0x16/0x18
[<c014f060>] ? kmem_cache_alloc+0x79/0xb0
[<c0160a26>] ? alloc_fd+0x4c/0xb2
[<c0150258>] ? do_sys_open+0x46/0xed
[<c0150341>] ? sys_open+0x1e/0x26
[<c0102c07>] ? syscall_call+0x7/0xb
Code: 0f 84 16 01 00 00 8b 47 0c 83 e0 f3 89 47 0c 8b 47 0c 48 0f 85 03 01 00 00 8b 87 04 04 00 00 85 c0 74 42 8b 0d f4 ad a5 f8 eb 0d <8b> 89 14 03 00 00 8b 41 1c 85 c0 78 0a 8b 57 1c 8b 41 1c 39 c2
---[ end trace 4a04e77971ee2cfd ]---
Weird: my code was actually calling rt_mutex_unlock but that's aliased in RTAI to rt_sem_signal.

Decoding this was a bit of a puzzle as ksymoops is deprecated for 2.6 The offending instruction was
mov 0x314(%ecx),%ecx
which after a bit of disassembly turned out to be in enq_ready_task() whose code is a bit counterintuitive (it looks at the semaphore's task queue).

Divining that my semaphore may be borked I looked at the code (a kernel module) that I wrote 2 years ago and it read
SEM pollSem;
which should be always initalised to zero at insmod time it just that it isn't.

So I changed the module_init code to
memset(&pollSem,0,sizeof(&pollSem));
rt_typed_sem_init(&pollSem, 1, RES_SEM | PRIO_Q);
which seems to have done the trick.


Apr 10 update: Nope, did not work. The mutex was shared between RTAI and Linux tasks and releasing it from Linux confused the RTAI code -- maybe they've only expected RTAI mutexes to be used solely in RTAI context. My problem was that I was trying to protect a resource (a callback registry) that was changed from Linux but used from RTAI. In the end I used it lockless in RTAI context (Linux is not running when RTAI is executing) and protect it with cli/sti in Linux context when it is manipulated to make sure RTAI does not barge in.

I would like to commend the RTAI people for the utter lack of documentation on mutexes or anything else and sparse sample code that they abundantly put forward for people to learn their wares. If it was hard to write then it must be hard to use seems to be their groupthink.

-ulianov

Thursday, March 15, 2012

Missed interrupts in Hard RT context (Linux/RTAI)

My client has an embedded box (a Core2 Duo SOM) which runs a realtime application under RTAI. The meat of it is in an ISR/IRQ which fires every 100 uS and which is triggered by an FPGA dangling off PCIe.

For the past year they have seen and characterised this "missed interrupt" problem:
a) after boot-up IRQs are missed for 12.5 minutes;
b) interrupts misses occur every 2.9999 seconds.

The working hypothesis was that this happens when SMI interrupts are happening but we looked thru the Phoenix BIOS and did not find any setting for turning SMIs off. We turned off everything we can, including power management to no avail.

The maker of the SOM was of no help.

The missed interrupts happened while a huge software stack was running: Linux, RTAI, a Java/AWT+X11 app. Hard to narrow it.

We took a few steps:
1. boot to LILO prompt and wait for 13 minutes => still happened;
2. boot bare Linux and pause /sbin/init for 13 minutes => did not happen.

The LILO thing was interesting. So I reckoned I might do some stuff under good old DOS and see what happens (no drivers loaded).

I wrote this C (djgpp) program to observe. Basically I am reading a Pentium performance counter in a tight loop and do a bit of math with interrupts off:
#include <stdio.h>

// COMPILER-DEPENDENT CODE GOES HERE

#define SAMPLE_COUNT 1000

typedef struct {
unsigned long long tsc;
unsigned long long pass;
unsigned long long dT;
} fault_t;

int faultc = -1;
fault_t faults[SAMPLE_COUNT] = {0};

int main (int argc, char* argv[])
{
int i, count=0;
int max_samples = SAMPLE_COUNT;
unsigned long long pass = 0;
unsigned long long last_ts = 0;

if(argc > 1) {
if(sscanf(argv[1], "%d", &max_samples) != 1) exit(1);
if(max_samples > SAMPLE_COUNT) max_samples = SAMPLE_COUNT;
}

if(max_samples < 0) return 0;

cli();

last_ts = RDTSC();
for(pass=0; faultc < max_samples; pass++) {
unsigned long long dT = 0, tmp = RDTSC();
if(last_ts > tmp) { last_ts = tmp; continue; } // overflow
dT = tmp - last_ts;
last_ts = tmp;
if(dT > 350000) {
++faultc;
faults[faultc].tsc = tmp;
faults[faultc].pass= pass;
faults[faultc].dT = dT;
if((faultc + 1) % 5 == 0) { fprintf(stderr, "!"); fflush(stderr); }
}

if(faultc >= 0 && (tmp - faults[faultc].tsc) > 10000000000LL) {
fprintf(stderr, "TIMEOUT"); fflush(stderr);
break;
}
}

sti();

for(i = 0; i < faultc; i++)
printf("pass=%012lld T=%016lld dT=%016lld\n", faults[i].pass, faults[i].tsc, faults[i].dT);

return 0;
}
For djgpp the low-level code goes:
static inline unsigned long long RDTSC(void)
{
unsigned long lo, hi;

__asm__ __volatile__ (
"xorl %%eax,%%eax \n cpuid"
::: "%rax", "%rbx", "%rcx", "%rdx");
/* We cannot use "=A", since this would use %rax on x86_64 and return only the lower 32bits of the TSC */
__asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi));

return (uint64_t)hi << 32 | lo;
}

void cli() { __asm__( "cli" ); }
void sti() { __asm__( "sti" ); }
And bingo! I reproduced the observed behaviour down to the 12.5 min runtime. So this is not Linux's fault at all.

The djgpp code runs under DOS but in Protected Mode. The LILO test prompted me to run this code in Real Mode but djgpp cannot generate this code (easily).

So I fetched OpenWatcom and modified the low-level code like so:
unsigned long long RDTSC()
{
unsigned long long ret = 0;
unsigned long hi = 0, lo = 0;
__asm {
rdtsc
mov dword ptr lo, eax
mov dword ptr hi, edx
};
ret = ((unsigned long long)hi << 32) | lo;
return ret;
}
void cli() { _asm cli; }
void sti() { _asm sti; }
The Watcom C's assembler syntax gave me some headache.

This code ran in pure Real Mode with the same result. So it must be SMI.

I g00gled for RTAI+SMI and found out about the smi-module. Tried that but my hardware went undetected. Hmm. This module has a list of "known" hardware by PCI id but that told me not much.

So I fetched the PCI ids database from Sourceforge and had a look and interestingly some of the known hardware were ISA bridge controllers. Hmm. Then I looked at my hardware and noticed one of those as 0x8086:0x27bd. I added this to the lookup table and the module worked.

Indeed with the SMI interrupts turned off we had no more mysterious missed interrupts and the saga was closed. I guess I was lucky this time.

-ulianov

Thursday, March 1, 2012

On Doing Ugly Linux Kernel Hacks for Need & Profit

A client has two targets ("new" and "legacy") and they insist they run the same system image on both. The two have subtle differences, the most annoying being that the legacy one does not support (U)DMA for CompactFlash cards.

Linux can be told to disable (U)DMA support by passing "ide-core.nodma=0.0" in the command line but there's no way to re-enable later on, performance is dead-slow and these folks have tons of code that has to be loaded from the CF and they like fast boot times.

I cut some of their load time by upx'ing their XFree86 binary and proprietary app and that helped somewhat on both platforms. And it shrank their installer by 1M which is always a bonus.

I tried to muck with LILO but second.S is written in assembler (mine is quite rusty) and getting to the actual command line sector is yucky.

The fall back was to butcher the kernel. I have simple target detection routine which uses cpuid, let's call that chkLegacyCPU().

I changed start_kernel() in init/main.c to append " ide-core.nodma=0.0" to the command line (which is now a global in main.c) if the legacy platform is detected.

Also the legacy target does not support booting in graphics mode (vga=8) so I modified main() in arch/x86/boot/main.c to force the VGA mode to NORMAL_VGA if legacy platform.

Interestingly the two modifications are in two different parts of the kernel -- the one in x86/boot executes in x86 real mode and has a totally different software stack so I had to duplicate the code of chkLegacyCPU() for it.

-ulianov

Thursday, February 23, 2012

IEC GOOSE Application -- Network Latency On a RTAI/RTnet Box

I have an embedded box running Linux 2.6.29/RTAI/RTnet. I hacked the e1000e driver for use with RTnet and also I eviscerated its ISR as (due to PCI limitations) its IRQ sits on the same line as an FGPA IRQ my client is using.

The FPGA RTAI ISR is calling the old e1000e ISR by
a) signalling a semaphore which
b) wakes an RTAI tasks which
c) calls the e1000e ISR.

On top of this there is an LXRT hard RT user-mode application (proprietary IEC GOOSE stack adapted by me to RTnet) which scoops all GOOSE packets from the wire.

Alas rteth0 must be put in promiscuous mode as Ethernet cards generally only allow 4 or 8 MACs to be programmed in their multicast list and GOOSE frames are emitted with the destination MAC to be the same as the source MAC plus the multicast bit turned on. So we can easily shoot past 4/8.

The LXRT application also knows how to send GOOSE packets.

The application is hooked to a Linux/RTAI driver which executes every 100 uS and does stuff. The comms between the app and driver are
a) RX -- RTAI shared memory for the incoming data and a volatile change counter
b) TX -- an RTAI semaphore signalled by the driver and RTAI shared memory for the outgoing data.

So the app stuffs data (GOOSE points) into the driver and sends driver points packaged as GOOSE frames.

One of the things the driver can be programmed to do is to respond to incoming point changes by reflecting them into outgoing points.

So the data flow is
e1000e->RTnet->LXRT app->driver->LXRT app->RTnet->e1000e
We timed using wireshark running on an external box the response time to a change in a GOOSE point and the answer came to ~1ms ±10% which is not bad at all considering the PCIe latency, scheduling latencies and FPGA ISR -- the latter can account to up to 200 uS.

-ulianov

Friday, January 6, 2012

Simple/Elegant way to determine uptime on XP

This has been eluding me for a while... I've tried "net stats srv", WBEM, uptime.exe from M$ all of them sucking for use as CGIs under Xitami/WinXP.

Then I learned this trick on a Perl forum:
<?php
$DAY = 3600 * 24;

$stat = stat('c:\WINDOWS\bootstat.dat');
$mtime = $stat['mtime'];

$dT = time() - $mtime;

$d = intval($dT / $DAY); $dT -= $d * $DAY;
$h = intval($dT / 3600); $dT -= $h * 3600;
$m = intval($dT / 60); $dT -= $m * 60;
$s = $dT;

$pl = "s"; if($d == 1) { $pl = ""; }
printf("up %d day%s %d:%02d:%02d\n", $d,$pl, $h,$m,$s);
?>
-ulianov

Tuesday, December 6, 2011

Perl::Tk Minimize to Systray!

Finally!

After searching fruitlessly on the Perl mailing lists for a free alternative to PerlTray (from the ActiveState PDK) I have banged together my own Rube Goldberg-esque contraption which works!

I am doing it the wrong way: one thread (main) to mind the Tk GUI and another (asynchronous) which minds the SysTray icon. They send one another (window) messages in a very crude fashion which happens to be OK and somehow works cross-thread.

This may make a Win32 artist cry and that's what makes it sweet.

Also I am disrespectfully meddling with the innards of Win32::GUI::NotifyIcon. Life's good.

Enjoy the code! (You need to supply a valid Win32 ICON file named icon.ico.)
#!perl.exe

use strict;

use threads;
use threads::shared;

use Tk;

use POSIX ();
use File::Basename qw[dirname];
use File::Spec::Functions qw[ catfile rel2abs updir ];

use Win32::API;
use Win32::SysTray;

Win32::API->Import("user32", 'ShowWindow', 'II', 'I');
use constant SW_HIDE => 0;
use constant SW_RESTORE => 9;
sub win_op($$)
{
my $top = shift || return;
my $op = shift;
ShowWindow(hex($top->frame), $op);
}

Win32::API->Import("user32", 'MessageBox', 'NPPI', 'I');
use constant MB_OK => 0;
use constant MB_ICONSTOP => 16;
sub errorMessageBox($$)
{
my $title = shift;
my $msg = shift;

MessageBox(0, $msg, $title, MB_OK|MB_ICONSTOP);
}

sub Die
{
errorMessageBox('Sample SysTray', join('' => @_));
POSIX::_exit(1); # NOTREACHED
}

my $Tray : shared;
sub tray_hide()
{
return unless defined $Tray;
my ($handle, $id) = split /:/ => $Tray;
Win32::GUI::NotifyIcon::_Delete($handle, -id => $id); # HACK!!!
}
sub tray_thread($)
{
my $top = shift || return;

my $tray = new Win32::SysTray (
name => 'Sample SysTray',
icon => rel2abs(dirname($0)).'\icon.ico',
single => 1,
) or Die 'Win32::SysTray failed!';

$tray->setMenu (
"> &Show" => sub { win_op($top, SW_RESTORE); },
">-" => 0,
"> E&xit" => sub {
$tray->{Tray}->Remove();
POSIX::_exit(0); # CORE::exit makes Tk barf
},
);

my $t = $tray->{Tray};
$Tray = $t->{-handle}.':'.$t->{-id};

$tray->runApplication;
}

sub main()
{
my $mainw = MainWindow->new(-title=>'Sample SysTray');

async { tray_thread($mainw); }

$mainw->OnDestroy(sub {
tray_hide(); # else we have zombie Systray icon
POSIX::_exit(0); # should kill background threads
});

my $fr = $mainw->Frame->pack(-side => 'bottom', -fill => 'x');

$fr->Button(-text => "Exit",
-command => sub { exit(0); }
)->pack(-side => 'right');

$mainw->bind('<Unmap>', sub { win_op($mainw, SW_HIDE); } );

MainLoop();
}

main();
-ulianov

Thursday, November 24, 2011

On using UPX on Static linux-i386 Binaries

On an embedded target I work binaries on are stored on a CF card which has about 1.8M/s read speed. However the static binary in question is about 40M uncompressed or 8M gzip'ed.

As in the old Stacker days the box can decompress faster than it reads from CF so a compromise has been reached: store the binary gzip'ed, decompress to /tmp (a ramdisk) and run from there.

This embedded system does not have swap enabled but the kernel in low mem situations uses demand paging for the r/o pages in the .text area of a binary. I.e. it steal LRU code pages from in-core knowing they will be found in the on-disk binary. This is why one gets a Text file busy error when one tries to alter a binary which is running.

In our case we end up with basically two copies of the binary in-core (this a GCJ-compiled Java app so the .text is fairly substantial tho 90% of it is junk).

Then I took the upx -9 route and the results are quite interesting. The compressed binary shrank to 7M which means faster load time and a smaller software installer.

Here is some sample C code:
int main()
{
char c = 0;
printf("Press ENTER:"); fflush(stdout);
read(0, &c, 1);
return 0;
}
The static stripped binary is 377,204 bytes and the upx'ed static binary is 174,880 bytes. size(1) reports for a.out:
   text    data     bss     dec     hex filename
371111 3144 4448 378703 5c74f a.out
Running and suspending the binaries we get:
 a.outa.upx
VmSize 516 kB 524 kB
VmLck 0 kB 0 kB
VmRSS 124 kB 96 kB
VmData 140 kB 508 kB
VmStk 8 kB 12 kB
VmExe 364 kB 4 kB
VmLib 0 kB 0 kB

So upx moves the code from .text to the data of the running binary. Demand paging bye-bye but at least we don't (theoretically) keep two copies of .text in core.

In practice Linux cheats and does not fault in all the pages of the binary when loading it... it loads enough to make it start and it's lazy about the rest... if the binary needs those pages they will be faulted in later.

Or this is a bed-time story for bearded UN*X hackers.

-ulianov

Wednesday, September 21, 2011

A Thing of Beauty!



Check out the uptime in the screenshot -- it's 1340 days which is 3 years and 8 months. Take that N3tcraft!

This is the router I've used for my backup DSL line for 2 years. True, I've disconnected the line 18 months ago but the box is still chugging along in a closet in my basement thanks to a decent UPS and my neglect ;-)

Alas I have to move this box to a new location and possibly decommission it.

-ulianov

Tuesday, August 23, 2011

A Fiercer Way To Detect a CD-ROM/DVD Driver Letter Under Cygwin/MinGW

This works even under MinGW and has a wicked bit of AWK to parse Unicode cr*p:
#!/bin/sh
reg query 'HKLM\SYSTEM\MountedDevices' | \
awk 'BEGIN { letter=""; }
/DosDevices\\[D-Z]:/{
d=$1; a=$NF;
dr=substr(d, length("\\DosDevices\\")+1, 1);
i=0; str="";
while(length(a) > 0) {
c=substr(a, 0, 2); a=substr(a,3);
if((++i%2)==0) { continue; }
str = str sprintf("%c", strtonum("0x" c));
}
if(verbose) { print dr ": " str > "/dev/stderr"; }
if(tolower(str) ~ /cdrom/) { letter=dr; }
}
END {
if(length(letter) > 0) { print letter; }
else { exit 1; }
}
-ulianov

Monday, August 22, 2011

On Guilty Perl/Win32 Pleasures

I've been messing with PerlApp-packaged gui Perl apps for a while and I was annoyed that stderr output (useful when debugging) was not available when having the exe type set to Win32.

I have just remembered about an obscure W*ndows feature: debug messages (a lame-arse feature cloning syslogd(8) and only available in a debugger). So I set myself to use this having fond a debug message viewer http://alter.org.ua/soft/win/dbgdump/DbgPrnHk_v9a_all.rar.

The question was how to log to stderr when running as a console app (under perl.exe) and to the debug message subsystem if running as a non-console app (unde wperl.exe)?

I found no direct answer but kernel32!GetConsoleTitle can be used in an indirect way to answer this question:
#!perl -w
use strict;
use Win32::API;
use File::Basename qw(basename);
my $myself = basename($0);
Win32::API->Import("kernel32", 'OutputDebugStringA', 'P', 'V');
Win32::API->Import("kernel32", 'GetConsoleTitle', 'PN', 'I');
sub DbgPrint
{
OutputDebugStringA("$myself\[$$\]: ".join('' => @_)."\r\n");
}
sub isConsole()
{
my $title = 'x' x 128;
my $r = GetConsoleTitle($title, 128);
return if $r == 0;
return if $title =~ /^x+x$/;
return 1;
}
sub Log
{
not @_ and return;
return print(STDERR @_, "\n") if isConsole();
return DbgPrint(@_);
}
main::Log "Testing";
I must say that Dave Roth's book Win32 Perl Scripting: The Administrator's Handbook was an eye opener to all sorts of deliciously perverse Perl/Win32 programming tidbits.

-ulianov

P.S. Why not use the Event Log subsystem? Because is sucks even more than the debug messages subsystem! and because I like obscure features and because the debug messages are not on-disk persistent.

Friday, August 19, 2011

A Smart but Neglected BASH Feature

I had a humongous shell script and a wish to redirect stderr for a boat load of commands in one fell swoop. I could have used the {} grouping but for obscure reasons it was not appropriate.

The answer was to write a Bash extension in C which takes advantage of two things:
a) Bash does not fork(2) when it executes an extension so it's in-process;
b) dup2(2)
so it's possible to do funky things with the file descriptors and get away with it.

Alas on Win32 Cygwin's bash does not load this extension (MinGW does) and dup2(2) is just borked. Blame M$ for designing a braindead C library and OS.

Here's the code:
#include <stdio.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <asm/fcntl.h>
#include <errno.h>

#include "builtins.h"
#include "shell.h"

// Compile on Linux/x86:
// gcc -I/tmp/bash-3.2 -I/tmp/bash-3.2/include fdmangle.c -fpic -c
// ld -x -Bshareable -o fdmangle.so fdmangle.o
// Use in shell scripts:
// enable -f ./fdmangle.so fdmangle
// fdmangle 2 stderr

extern char **make_builtin_argv(); // Bash-ism

static int verbose = 0;

static int fdmangle_main(int argc, char **argv)
{
if(argc < 3) {
return 1;
}

int n = 1;
if(!strcmp(argv[1], "-v")) { verbose = 1; n++; }

if(verbose && argc < 4) {
return 1;
}

const int fdn = atoi(argv[n]);
const char* file = argv[n+1];

if(verbose) fprintf(stderr, "fdmangle %d -> %s\n", fdn, file);

int flags = O_CREAT | O_WRONLY | O_APPEND;
#ifdef __unix__
flags |= O_NOFOLLOW;
#endif
int fd = open(file, flags, 0640);
if(fd < 0) {
fprintf(stderr, "Cannot open for writing %s: %d (%s)\n", file, errno, sys_errlist[errno]);
}

dup2(fd, fdn);

return 0;
}

static int fdmangle_builtin(WORD_LIST *list)
{
char **v=NULL;
int c=0, r=0;

v = make_builtin_argv(list, &c);
r = fdmangle_main(c, v);
free(v);

return r;
}

static char* fdmangle_doc[] = {
"File descriptor mangling",
(char *)0
};

struct builtin fdmangle_struct = {
"fdmangle",
fdmangle_builtin,
BUILTIN_ENABLED,
fdmangle_doc,
"fdmangle [-v] fd file",
0
};
-ulianov

Monday, August 8, 2011

How To Detect a CD-ROM/DVD Driver Letter Under Cygwin

While automating a provisioning process I stumbled upon this issue... How do you know what drive letter is a CD-ROM?

The first try was:
grep -qw iso9660 /proc/mounts || return 1;

echo $(mount | awk '/iso9660/{print $3}')
which happens to work when a disk is inserted in the drive.

At my deliverables demo the disk was not present so bummer.

The second uses the Registry and works (in XP):
    cdrom='';

for dr in D E F G H I J K L M N O P Q R S T U V W X Y Z
do
[ -e /proc/registry/HKEY_LOCAL_MACHINE/SYSTEM/MountedDevices/%5CDosDevices%5C${dr}%3A ] || continue;
tr -d '\00' < /proc/registry/HKEY_LOCAL_MACHINE/SYSTEM/MountedDevices/%5CDosDevices%5C${dr}%3A | grep -qi cdrom || continue;
cdrom="$dr";
done
[ -z "$cdrom" ] && return 1;
echo /cygdrive/$(echo $cdrom | tr 'A-Z' 'a-z');
Brutal, eh?

Anyways
grep -qw iso9660 /proc/mounts || return 1;
is a great way to check whether a disk is in the unit (any unit).

-ulianov

Wednesday, July 27, 2011

The Scripting of ssh(1) Conundrum

Now as before I am trying to do automation of remote SSH operations. ssh(1) is particularly unfriendly to that. Ditto scp(1)

It can be done using a masochistic trio:
1. set SSH_ASKPASS, provide an askpass program;
2. set DISPLAY to something;
3. run ssh under setsid(1)
but that precludes access to $?

In the past I've worked around that by using Perl's Net::SSH::Perl (very slow) or by using Dropbear which is MIT-licensed.

This time as my automation Bash script must run under Linux and Cygwin/Win32 I have elected to use plink/pscp which are native to Windows, lend themselves easily to scripting, mirror stdin/stderr and which to my pleasure compile cleanly on Linux and work as designed.

One leg up for Simon and legacy boos to Tatu and the OpenBSD team.

-ulianov

Tuesday, June 21, 2011

Using a WindMobile Huawei E1691USB stick in Linux

This piece of Chinese-designed cr*p is a dual-use USB stick, i.e. it has a dual personality as a USB device. In its "native" state it looks like a CD-ROM (so you can install the drivers in Windows and MacOS off it) yet it has to be reset with a "secret handshake" to look like a COM port.

I g00gled for this and found no good answer but some good bits. And here is how they came together:
1. you need the usbserial and a recent copy of the option [GSM driver] modules loaded;
2. the reset sequence (via usb_modeswitch):
DisableSwitching=0
EnableLogging=1
# Huawei E1691
DefaultVendor= 0x12d1
DefaultProduct= 0x1446
TargetVendor= 0x12d1
TargetProduct= 0x140c
MessageContent="55534243000000000000000000000011060000000000000000000000000000"
CheckSuccess=5
3. I connected via vwdial, this is the config file for it [it's Wind-specific]:
[Dialer Defaults]
Init1 = ATZ
Init2 = ATQ0 V1 E1 S0=0 &C1 &D2 +FCLASS=0
Init4 = AT+CGDCONT=1,"IP","broadband.windmobile.ca"
Modem Type = Analog Modem
Baud = 115200
New PPPD = yes
Modem = /dev/ttyUSB0
ISDN = 0
Phone = *99#
Password = gprs
Username = gprs
I used Knoppix 6.4.4 to connect and it worked beautifully.

Mind you you need good signal (which can be a problem with Wind).

VMware/Windows note: if you run Linux in a VM and you assign the USB stick to it then when you use usb_modeswitch it will cause the stick to re-enumerate and kick some "Found New Hardware" dialogs in Windows. Ignore them (Cancel).

-ulianov

Tuesday, May 17, 2011

MTU/Fragmentation Strikes Again

I have this strange setup:

A <---OpenVPN/TCP over SSH--> B <--localnet--> C

:5432 ---DNAT---> :5432
in which host A is a VPS server somewhere in Illinois and hosts B & C are at home.

Host A needs to connect to a PostgresQL server running on C but for obscure reasons I do not want to run full routing/masquerading on B so I put a DNAT rule so A connecting to B:5432 in effect talks to C:5432

I had problems with a SQL insert A->C (it was the body of an e-mail). My test case message had just a few bytes in the body so the INSERT was completing A-OK.

However in real use this insert was taking forever and my Milter was timing out as a result (brr). Debugging on A was rather harsh as it's a VM with SElinux enabled so many things around ptrace(2) are borked.

After a few missteps I divined that the default MTU for the VPN interface (1500) was 1500 and since the link A->B is point-to-point (A is blissfully unaware of C's existence) A will never perform a path MTU discovery.

The fix was to lower the MTU to 1300 (be on the safe side as I did not bother to measure the overhead of the SSH envelope) on A and B.

-ulianov

Monday, May 16, 2011

Compressed Output from Bash CGI Scripts

I have blogged before on how to compress the output of Per CGIs. As I've started to use and Android phone I learned that some of my status pages dump a heck of a lot of output. A few are written ad Bash CGI scripts.

So here's how to repeat the trick in Bash:
#!/bin/sh

gz_pipe='cat';
echo $HTTP_ACCEPT_ENCODING | grep -qw gzip && {
gz='ok'; gz_pipe='gzip -9f';
}

echo "Content-Type: text/html";
[ ! -z "$gz" ] && echo "Content-Encoding: gzip";
echo '';

{
echo '<html><PRE>';
set
ps axf
echo '</PRE></html>';
} | $gz_pipe
For user agents that do not accept gzip-encoded output we use cat(1) as a straight pass-thru as there is no 'nice way' to put nil afer a pipe sign (|) in Bash.

-ulianov

Wednesday, March 2, 2011

On Webapps

I have always loathed GUI apps and especially HTML webapps. And this is not for lack of trying (I once wrote an asset-tracking app using HTML 3.2 and Bash shell scripts as backend).

At my current contract I was asked to write a custom webapp for time tracking (alas nothing was available as open source which fulfilled the requirements). At my previous position I have seen my co-workers build a fairly beefy webapp as a network appliance configurator using YUI2. I tried to stay away from it as I was having more fun writing C++ backends.

Now I had to do it front and back so I wrote some Java servlets that interact with a MySQL db which a) act as XML-RPC endpoints so one can set values and b) as XML generators [SQL table format to XML/table format translators] for data presentation. I tried to keep the Java back-end as straight-thru as possible.

I managed to keep most of the business logic in SQL as materialised views. Yppie!!

I sinned a bit as I provided a thin Perl layer that sometimes takes the raw XML output from Java and spits out JavaScript or JSON. I had to as the servlets run on Jetty on localhost:8080 so there was no direct access from the browser.

I could have configured Jetty as a full web browser but it's a royal pain in the arse to do so.

For the front-end I went the full hog with YUI2! I used a bit of extra JavaScript to do the XML-RPC (and I got a buggy client implementation to contend with) and JSON bits. I also sinned a bit with JQuery.

Other than that I've done it screen by screen using YUI and DataTable (DataSource is horribly interlinked with DataTable -- for the life of me I could not beat a DataTable into being used as stand-alone; so I went JSON/JQuery). In the end I assembled the individual/standalone pages using a TabView and iframes (evil, I know).

The webapp actually looks good and is pretty snappy once the JIT compiler kicks in. For the Perl bits I configured mod_perl (with a lot of preloaded Perl modules) as I did not like the 100 ms hit I got every time I was calling a Perl cgi.

I handled a form of basic authentication using Apache's .htaccess, mod-rewrite for setting the access rights (sigh) and Perl for session handling. The browser is handled a random cookie which represents the session.

I deployed all the shebang on an OpenSuSE 11.0 VM I had lying around and it only took me a 1G .vmx file to do so.

So I am happy with the result and with the YUI2 capabilities. I even added a flourish of a YUI2/Flash chart to show availability levels. I did not have to sweat a bit on HTML and JavaScript to build up a good-looking functional GUI, it worked out of the box in Firefox and IE6 (yes I work for such a unupgradable corp) and it only took two weeks to build.

-ulianov

P.S. There's talk of AD authentication but that ought to be handled by an Apache module if AD has LDAP well configured.

Thursday, October 21, 2010

Porting the Linux e1000e Driver to RTnet

My client just switched his hardware to a SOM which comes with an on-board E1000 PCIe chip. The RTnet (or Linux) e1000_new driver did not recognize the card so I had to hack the e1000e driver.

Here are the steps (mostly in netdev.c):
- borrow Makefile from RTnet e1000_new;
- junk Ethtool code, disable IPv6;
- short out all of the TCP offloading code -- this is a real time networking stack;
- rip out the Linux stack calls and replace with RTDM if applicable;
- ditto for skb->rtskb;
- kmalloc / kcalloc / kzalloc wrapped for RTDM;
- hash-define rtskb_copy_to_linear_data, rtskb_copy_to_linear_data_offset to some memcpy;
- pre-allocate a pool of 256 rtskbs for RX and initialise it;
- changed IRQ allocation to legacy (not MSI);
- connect to STACK_manager in e1000_open;
- added I/O RT task which is the RTnet bottom half -- waits on a semaphore, calls e1000_clean() and if packets received signals RTnet to process the packets;
- changed the ISR (legacy) to signal the semaphore above is IRQ belongs to the driver;
- removed the QoS stuff;
- removed the VLan stuff;
- disabled multicast code -- RTnet's support for it is flakey;
- disabled set MAC code -- RTnet does not support it;
- disabled skb frags code -- RTnet does not support it;
- disabled change MTU code -- RTnet does not support it;
- disabled power management code -- RTnet does not support it;
- modify RTnet's configure to have "--enable-e1000e".

I spent most of the time trying to compile the hacked code. After I got an IRQ kicking I spent a lot of time making sure the driver is stable and that inserting/removing its module does not thrash the box.

The ported driver (base kernel version is 2.6.29) is here.

-ulianov

Friday, October 15, 2010

A Very Nice and Very UNIXy Bug

I encountered a very puzzling bug in a user-land app I helped write. I wrote the socket comms library and the database backend (using SQLite3).

Once in a blue moon one of the sqlite3 databases would become corrupted and would be written off by the integrity_check pragma. The first thing to blame was sqlite3 with multithreading (tho I had it compiled for that). It ended up not being this.

What happened was soo much better: the socket library was being used by a thread pool so when a new request came it was handed to a worker thread which had to talk to some innards of the app and formulate a result. In the meanwhile the accepted socket was being kept open.

The comms library was used on the client side by ephemeral CGIs that were called by a YUI2 webapp. The web server was thttpd -- it has a CGI timeout cut-off after which it kills the child CGIs.

The innards of the app could sometimes take longer to respond that the CGI cut-off time so the CGI vanished and the client socket was closed. But on the app side (=server) the TCP socket was being used. When finally the innards finished doing whatever they were doing the data would be written to the dangling socket using write(2).

But in the meanwhile in another galax^H^H^H^Hthread a sqlite3 db would be opened, used and closed. UNIX has a policy of reusing the lowest numerical socket that becomes free.

See the conflict? The worker thread would write some late-arriving data to what it thinks it's a socket but now it's a file descriptor!

I fixed this by changing all calls for read/write in the comms library to recv/send -- the latter pair only works on sockets. Also for added paranoia I sprinkled the comms code with getpeername(2) and would log a critical error if a socket descriptor did not look as a socket.

Only took me two days to get to the bottom of this.

-ulianov

Tuesday, May 25, 2010

How to Measure the Duration of a RTAI Semaphore Operation

Today I was asked the question "How long does it take to signal a semaphore?".

This is important as I had to do it in an ISR [which serviced a quasi-timer IRQ] to signal an RT task to start doing a job (this is akin to Linux tasklets).

Here is how I measured it (interrupts are disable to keep the measurement accurate):

#include <linux/module.h>
#include <linux/moduleparam.h>
#include <linux/kernel.h>
#include <linux/types.h>

#include <rtai_sem.h>
#include <rtai_sched.h>

SEM pollSem;

static int __init testsem_init_module(void)
{
rt_typed_sem_init(&pollSem, 0, BIN_SEM | FIFO_Q);

rt_global_cli();

volatile unsigned long long ticks_start;
__asm__("rdtsc\n\t"
"mov %%edx, %%ecx\n\t"
:"=A" (ticks_start));

rt_sem_signal(&pollSem);

volatile unsigned long long ticks_end;
__asm__("rdtsc\n\t"
"mov %%edx, %%ecx\n\t"
:"=A" (ticks_end));

rt_global_sti();

long long dT = ticks_end - ticks_start;

rt_sem_delete(&pollSem);

printk(KERN_DEBUG "rt_sem_signal took %lld ticks\n", dT);

return -EBUSY;
}

module_init(testsem_init_module);
MODULE_LICENSE("GPL");
The answer was (on a PIII/1.2GHz) about 300 cpu ticks which is ~0.25 uS.

-ulianov

Porting the Linux e100 Driver to RTnet

My client has been using the E100 Intel card that came with his embedded mobo. When using the RTnet driver (eepro100, an antique version of the Becker driver which did not do firmware download to the card) they were experience strange TX lockups which could only be cured by a power cycle.

It was either trying to fix the eepro100 driver (and maybe download firmware borrowed from the newer E100 driver) or port the stock Linux e100 driver which had no lockup issued, quod fecit.

Here are the steps (in e100.c):
- alter Makefile, Makefile.in from RTnet;
- junk Ethtool code, disable IPv6;
- junk eeprom write code;
- junk loopback test code;
- rip out the Linux stack calls and replace with RTDM if applicable;
- ditto for skb->rtskb;
- replace schedule_work with a semaphore and rtdm_nrtsig_pend()
- kmalloc / kcalloc / kzalloc wrapped for RTDM;
- pre-allocate a pool of 256 rtskbs for RX and initialise it;
- connect to STACK_manager in e100_open;
- added I/O RT task which is the RTnet bottom half -- waits on a semaphore, calls e100_poll() and if packets received signals RTnet to process the packets;
- changed the ISR (legacy) to signal the semaphore above is IRQ belongs to the driver;
- disabled set MAC code -- RTnet does not support it;
- disabled change MTU code -- RTnet does not support it;
- modify RTnet's configure to have "--enable-e100".

I spent most of the time trying to compile the hacked code. After I got an IRQ kicking I spent a lot of time making sure the driver is stable.

The ported driver (base kernel version is 2.6.29) is here.

-ulianov

Monday, April 19, 2010

To Have a CLI or Not?

Any embedded product for the telco market will eventually have a CLI. Most have Web GUIs but they are not good enough as scripting a JavaScript-heavy GUI is hell.

So customers usually demand a CLI so they can script their operations and perhaps make the configuration job easy -- any network admin worth his salt dislikes clicking a 1000 times to bring a system up to scratch.

Cīsco reigns king here. Everybody in the industry is familiar with their style of CLI and expects it. (The alternative is to give access to the native OS commands, i.e Linux, but this can be dangerous and does not make the configuration job easier.)

In a previous life I have seen this implemented ad-hoc in C (using ncurses) and in a different shop a horrendous mess made with Python (the latter was quasi-unmaintainable). Both had problems and did not conform to the Cīsco style.

Luckily a smart cookie published a LGPL libcli (written in C) which actually works well.

The only modification I had to make [and submit back] for adapting it to a serial port was to cut out some funny characters it was sending at the beginning of the session.

The downside was that I had to convince the powers that be in my company that this won't blow up their IP. Also I had to link in statically libsqlite3 so I can manipulate the password database which pushed the size of my custom CLI to 0.5 megabyte.

-ulianov

Tuesday, March 23, 2010

M$ Vestigials: nmake and CMD.EXE

I was trying to integrate my Linux PPC build with TFS build and the only way I found this possible was via a "nmake" VS project. This is just contorted: a VS 2010 project wrapping a nmake Makefile is promoted to "build" status.

All this fuss just to map a Samba share, SSH into the Linux build machine and perform the build (the denx cross-compiler we use is only hosted on Linux).

The nmake is a pale and withered imitation of the one true make, the GNU make.

And I am not saying it for lack of trying: nmake does not pick up environment variables correctly and it certainly does not allow me to say
TIMESTAMP = $(shell unixdate '+%%Y-%%m-%%d_%%H.%%M.%%S')
Also it does not follow the respected idiom CC ?= gcc (i.e. Set the CC variable to "gcc" if it wasn't set alteady in this Makefile or the environment.)

This may seem obscure but if one wants to get TFS to get a snapshot onto uniquely named directory onto a Linux samba share then one is out of luck.

Which brings me to the need to have a CMD batch file to handle all this. The backticks implementation in CMD is heinous:
for /F "usebackq" %%a IN \
(`"unixdate +%%Y-%%m-%%d_%%H.%%M.%%S"`) \
do @set TIMESTAMP=%%a
Yes, these are the wrong way to do stuff on a Win32 machine but you do what you have to do to get the job done.

-ulianov

Monday, March 22, 2010

When make(1) Starts Spinning in Circles

I have this Linux project where the "depend" Makefile target reads:
${DEPDIR}/%.d:  %.cpp
@echo "Determining dependencies for $<"
@${CXX} ${CPPFLAGS} -E -MM $< | \
sed -e 's~\($*\)\.o[ :]*~\1.o $@ : ~g' > $@
In one instance this snippet would start to run in an infinite loop.

On a closer look it turned out that the build machine I was trying to make this the system clock was waaay behind the `real' time and there was nope in h*ll to get it sync'ed as the corporate firewall blocks the S/NTP.

The files were checked out out of TFS via Samba which somehow preserves the time stamp of the client Win32 machine (whose clock was right).

The clock being behind it means that all builds artefacts were indeed older than the files that came out of revision control, thus the infinite loop.

The solution is rather simple (as this is build machine, not a dev machine):
# The time on the Linux build machine may drift
# and be behind the time of the TFS agent. Yet when
# files are deposited on the Samba share their
# time stamps come from the agent and that can
# be "newer" than any file we produce locally
# on the Linux machine thus triggering an IFINITE loop.
#
# We time stamp all files with the (same) local time
# stamp to avoid this.
CO_TIME=$(date '+%Y%m%d%H%M'); export CO_TIME
find -type f -exec touch -t $CO_TIME {} \;
unset CO_TIME
What's frustrating is that I had this problem 4 years ago when I did not have this blog to remind myself of build oddities.

Also during the same exercise I learned that the Win32 DEL command does not cope well with symlinks on the Samba share so it needs a small assisst at the end of the build:
find -type l -delete
-ulianov

Thursday, March 18, 2010

Speeding up XFree86 Startup on a x86 Target

I have an embedded motherboard with the ?i945G? graphics chipset. XF86 4.6.0 would take 11 seconds of blank screen to start up which is unacceptable for an embedded applicance [the user might think that the system went belly-up].

While playing with vesafb I learned that if I start Linux in graphics mode vga=0x314 (which also enabled me to claim half of the boot screen with the company logo) the start up time of the XF86 (same server) is cut to only 1.5 sec.

Prior to this I tried Xvfb which starts instantaneously but it's horrendously slow.

-ulianov

P.S. Bolting the logo into the kernel using a 224 colour ppm ASCII bitmap is gross.

Tuesday, March 16, 2010

Circumventing a Corporate Firewall

Having work in the NA corporate world for a while now I can list a few ways of circumventing firewalls and freely communicating with the outside world.

Keep in mind that corporations have edge firewalls and restrictive firewalls on the users' computers which in most cases run Windows XP.

Here are a few ways I got it working for edge firewalls:
1. running OpenVPN over UDP/33400 [apparently this port and and a few after are used by tracert];
2. running OpenVPN over TCP/21: this is used for FTP and some corporations allow direct FTP connections;
3. running OpenVPN over TCP/1194, which comes as no surprise at this is the IANA OpenVPN port;
4. tunnel over HTTPS with CONNECT but this is short-lived.

Sometimes when one sells a device which needs to be controlled over Ethernet then it is next-to-impossible drive it for people have those pesky local firewalls.

Some esoteric ways to confound a local firewall would be:
1. use a Windows named pipe to talk to a Samba program that is the server for the named pipe;
2. use payloads for icmp-request and icmp-response in fact implementing a UDP/ICMP; some firewalls block incoming ICMP;
3. make the device respond with icmp-destination-unreachable (with payload) which are never blocked else all the IP stack is thrashed.
4. use UPnP to exchange data.

-ulianov

Monday, December 14, 2009

How to Learn What Code is Actually Disabling IRQs?

When one applies the PREEMPT_RT patch the spinlocks & mutexes get horribly overloaded. cli/sti use is frowned upon (x86) so one ends up with macros of macros of macros for locking so it is a head ache to see what functions are actually disbling interrupts.

Yet we are on x86 and the locking sequence generated by GCC is simple:
cli
...
sti
or could by a bit more complicated:
pushf
pop %reg
cli
...
push %reg
popf
Luckily I used objdump and a bit of massaging come to the rescue in the form of a neat script that finds all the object files that make up the compiled kernel and then searches the disassembler output for these sequences.

Bonus: the function names that do the locking are also printed.

Here comes findcli_x86.sh [run it at the root of the kernel tree after you compile the kernel]:
#!/bin/sh

# This file is licensed under GPLv2

# Note this works only with x86 code

obj_list='obj';
trap "rm -f $obj_list" 0 1 2 15

function extractcli()
{
file=$1;

[ ! -r "$file" ] && return 1;

[ -z "$OBJDUMP" ] && OBJDUMP=${CROSS_COMPILE}objdump

$OBJDUMP -dSCl $file | awk 'BEGIN {
idx="nosuchfunc"
}
/(^[a-zA-Z_]|cli[^a-z]|popf|sti[^a-z])/ && !/(file format|Disassembly)/ {
if($0 ~ /^[a-zA-Z_]/) {
idx=$1
} else {
F[idx]=(F[idx] "\n" "\t" $1 "\t" $NF);
}
}
END {
for(idx in F) {
print (" " idx F[idx])
}
}'
return 0
}

# main

find -type f -name '*.o' > $obj_list
for obj in $(cat $obj_list)
do
o=$(echo $obj | sed 's/^\.\///g')
[ "$o" = 'vmlinux.o' ] && continue; # this is the whole kernel
echo $o | grep -q 'built-in\.o' && continue; # these are aggregations

echo -en " \r$o" 1>&2
cnt=$(objdump -dS $o | grep -cw cli)
[ "$cnt" = '0' ] && continue;

echo -en " \r" 1>&2

src='???';
c=$(echo $o | sed 's/\.o$/\.c/g')
S=$(echo $o | sed 's/\.o$/\.S/g')
[ -f "$c" ] && src="$c"
[ -f "$S" ] && src="$S"

echo "$o: $cnt, src: $src"
extractcli $o
done
As usual awk comes to the rescue.

-ulianov

P.S. I looked for a decent decompiler for Linux other than objdump but I found only ancient ones and all broken. This is annoying as sometimes I need to see what external functions are called by an object file (.o).

Monday, December 7, 2009

DTS Headaches on a PPC440EPx Board

I had problems trying to get an ST RTC chip to be recognized by Linux. The chip was attached to the I2C bus 0 at address 0x68, the chip is supported by Linux but the two did not talk.

This is a custom board made specially for my US employer. The vendor brought up U-Boot, tested the peripherals with it and that was about it. I tried the denx.de kernel in a Sequoia configuration and no luck, I haggled the board vendor and they gave me a Linux kernel that booted on the board but did not detect much of the hardware.

By playing with the kernel config I got it to sniff the NOR flash and to partition it using a command-line scheme but the RTC chip was a mistery.

I dug a bit in the kernel source and docs and learned that the kernel expects from the bootloader a flattned device tree yet U-Boot does not provide that. The vendor provided a DTS file that has a textual representation of the devices but it did not get the RTC right.

By poking around I learned that the PPC kernel is matching text labels ("compatible") provided by the device drivers against the labels in the device tree. I hacked those a bit and the RTC works fine now.

-ulianov

Friday, December 4, 2009

Linux 2.6.31/PREEMPT_RT and Hard IRQs

I was trying to improve the latency of an ISR which serves an IRQ which fires every 100 uS. As a Softirq even when running at the highest priority the latency would some times slip to 1500 uS which is unacceptable (dd if=/dev/hda of=/dev/null comes to mind as a reason, hda is a slooow Compact Flash).

It does not help that the ISR is crunching numbers and doing float calculation in interrupt context. This ISR munches about 25 uS and should execute as close as possible to the beginning of the 100 uS interval.

So I tried to use a hard IRQ (IRQF_NODELAY) which was not good as the kernel would BUG_ON in rtmutex.c:807 when a process was trying to read from the character device which belonged to the driver which used the said IRQ.

I learned that spin_lock/spin_unlock are being overloaded by PREEMPT_RT by rt_mutex_lock/rt_mutex_unlock which do not seem to be compatible with hard IRQs: the mutex gets double-locked by the "current" process altho the two contenders are the hard-IRQ and code executed on behalf of a user process. I think this is bonkers.

I managed to get it working by using "atomic_" spinlock primitives which basically boil down [in disasm output] to cli/sti.

-ulianov

Thursday, December 3, 2009

A Nice Kernel Trick for x86

Today I learnt a nice way to obtain the number of microseconds since the power-up of the CPU using a P5 performance counter via (rdtsc). This instruction returns the number of ticks since power-up but it can be scaled to microseconds by dividing the number by the CPU frequency (in MHz).

The kernel code reads:
#include <linux/kernel.h>
#include <linux/cpufreq.h>

unsigned long long uSecSinceBoot()
{
volatile unsigned long long int cpu_ticks;

__asm__("rdtsc\n\t"
"mov %%edx, %%ecx\n\t"
:"=A" (cpu_ticks));

unsigned long long div = cpu_ticks * 1000;
do_div(div, cpu_khz); // TRICK! div is modified, do_div() returns reminder

return div;
}
I am using do_div() as on x86/32 bits long-long division is supported by GCC via an external function (_udivdi3) which usually lives in libgcc but it is not provided by the kernel.

-ulianov

P.S. I did extract _udivdi3 from libgcc and then disassembled/reassembled it but it's a pain. do_div is the right thing to use in the kernel.

Friday, September 18, 2009

The Strange Case of Multiplying Zombies

While working on a Linux/PPC embedded board I changed /etc/rc.sh (the system startup script as known by Busybox) to start our application in foreground.

This was good as we could see its output on the serial console and be able to interact with it. The application is a standalone binary which is interrogated by CGI and thru a Web interface. The web server is thttpd.

So I had everything running and I looked at the process table and noticed it being filled by zombie (Z) entries of the logger CGI (which gets invoked thrice a minute).

I tried to trace (strace) thttpd, I even put a waitpid(-1) at the top of its main loop [it's a single-threaded web server] and still could not get the damn zombies reaped!

This was baad as the system could stay up only for half a day before filling up the process table.

I did some hard thinking and remembered some APUE and Bach bits and concluded that Busybox [which alas contains init] must be still waiting for the termination of /etc/rc.sh before it starts the prescribed init behaviour!! I.e. Reaping orphaned processes and zombies.

So I put our application in background via nohup and voila! everything was good again.

-ulianov

Saturday, August 1, 2009

Mutex Problems on PPC, Again

One would think that doing
pthread_mutex_t mutex;
pthread_mutex_init(&mutex, NULL);
would yield a usable mutex.

Yet this is not always the case: on denx/PPC 440 strange things can happen (see the previous post When Classes Instantiated as auto Vars on Stack are Evil).

It turns out that the "correct" code sequence is
pthread_mutex_t mutex;
memset(&mutex, 0, sizeof(mutex)); // Voodoo for PPC
pthread_mutex_init(&mutex, NULL);
otherwise under some circumstances (e.g. mutexes ending up on the stack in the belly of an Object) one would have pthread_mutex_lock block on this mutex forever at the first invocation.

-ulianov

Tuesday, April 7, 2009

When Classes Instantiated as auto Vars on Stack are Evil

I have a threaded C++ app that's using message queue to pass data among threads like so (UML sequence diagram follows):
  Thread A (with response-Q) enqueues request in Thread B's input-Q
Thread A blocks on response-Q empty
Thread B wakes up [input-Q non-empty], dequeues request
Thread B munches on the request from A
Thread B enqueues result in A's response-Q
Thread B blocks on input-Q empty
Thread A wakes up [input-Q non-empty], dequeues response
Thread A goes its merry way.

Nice and easy, eh? And it has worked OK for a while...

You know how it is when one keeps adding code the bugs tend to be shifted on the shelf and rear their nasty heads? It happens in my case that Thread A was the main thread and its response-Q was declared as
Queue respQ("A's response Q");
Now this an auto var that lives on Thread A's stack.

Thread B was doing its job but when it wanted to enqueue the response in Thread A's respQ [it got a pointer to &respQ via a param] it would block in
pthread_mutex_lock()
in libc.

Bummer! I spent three hours writing LOCK/UNLOCK macros in class Queue that would confess who called them and in what thread and I was matching the results (yes, gdb was borked on that ppc target and was useless with inf thr et al.) and I really saw that the Queue instance of Thread A was indeed blocking in
pthread_mutex_lock()
but nobody had locked that mutex before!!

The funny part is that I had that mutex properly initialised in Queue::Queue(); I even changed its attribute to error-checking but it just did not help! That mutex would behave as if uninitialised and containing garbage!

After a while you get bored of this kind of debugging so I changed Thread A's code to read
Queue* respQ = new Queue("A's response Q");
and everything went smooth afterwards.

This yields the following article of faith:
Objects declared as class auto on stack are evil.
-ulianov

Thursday, April 2, 2009

When GCC is Not Smart Enough to Help

Have you noticed in the later versions of gcc that you have added lots of bogus warning-errors that suck the joy out of programming? Well, even with
-Wall -Werror
sometimes it won't help:
class X {
public:
typedef enum {
ERR1,
ERR2
} IOErrorCode;
const char* IOstrerror(IOErrorCode c); // string describing the error
};

struct S {
// ...
X::IOErrorCode status;
};
On another day and in another file I coded:
struct S* res = malloc(sizeof(struct S));
if(! DoSomeIO(res))
printf("IO/Error: %s (%d)\n", \
X::IOErrorCode(res->status), res->status);//(*)
and when I ran the program I would get a segfault at line (*) and GDB was indicating that the stack was partially smashed! Niice!

After scratching my head for half an hour it occured to me that I made a mistake: I coded
IOErrorCode(res->status) // BAAD
instead of
IOstrerror(res->status) // OK
The former is [in C++] a typecast to type IOErrorCode and will cause a crash inside printf().

The latter is a function call.

Ha! Not paying attention to my own code! And I had this sequence in five places handling I/O errors!

This is the most dangerous kind of error as one hits this code path infrequently (sometimes it only happens at a client's site thus driving the client mad).

-ulianov

Tuesday, March 31, 2009

When One Needs to Hide Things in Plain Sight on the WWW

Let's say that a web page needs to show content according to various local sensibilities and that it's hosted by a 3rd party that does not provide server-side scripting. What to do?

It's simple! Javascript+DOM+CSS come to the rescue: simply have a <div> with style="display:none". Store your content there in a plain/text scrambled form.

Then load a JavaScript script from a server that you control as the output of a server-side script.

Decide in your server-side script whether you wish to authorize the user based on the $REMOTE_ADDR to view the content and spit out either the decrypting JavaScript code or some dummy code.

Simple, eh?

The negative side-effect is that the search engines won't index your content. The positive side-effect is that the search engines won't index you content (maybe you don't like other people to use your content for their own purposes, e.g. keywordspy·com) and spammers won't be able to grab e-mail addresses from your pages ;)

-ulianov

Monday, March 30, 2009

When Networking != IPC

To some people it stands firmly to reason that
Networking != IPC
This boggles the mind as they see the only way for two (or more!) application [living on the same machine] to communicate is via SHM and semaphores (aka. "mailbox & lock"). They say it's faster.

It is -- Stevens states in UNP that such an approach is 30% faster than AF_UNIX sockets.

However there are some minor drawbacks in this communication pattern:
a. it cannot be extended across hosts (for sockets it's transparent, endianess non-withstanding)
b. it is only half-duplex (no better than the venerable pipes/FIFOs) so you need two such message queues;
c. there is no notification of a peer's death (TCP/IP sockets can send keep-alives and these can be tuned);
d. the notification of a pending message is wasteful and at best awkward: one needs to dedicate a thread on the receiving side to block on the semaphore; this can be mitigated with pthread_cond_timedwait but it cannot be mixed with select-ing on sockets and you'll end up with a thread babysitting the mailboxes;
e. if there is more than one receiver process then the receivers must hunt down the messages addressed to them in the mailboxes; worse if one of the receivers gets bored and ends execution its messages are cleaned up by no-one (can I smell a DoS?);
f. the data that can be transported via mailboxes is limited by the size of the mailboxes and one may have to resort to fragmentation -- things can get very hairy at this point;
g. this pattern is not observable, i.e. one cannot use tcpdump to look at the packets going to and fro; one must build custom tools to observe the data flow;
h. depending on the type of SHM used (e.g. SysV SHM) the mailboxes and their contents may persist after the death of all the parties involved; this can be good if one wanted that of very bad if the server process must clean up at start-up.

To me such a socket-phobia is unexplainable (that is thinking with a UN*X programmer's mind). I do recall tho the contortions and the horrendous API and sad performance of Winsock. Yet have I mentioned that this whole mailbox/lock brouhaha has to happen on Linux?

-ulianov

Saturday, March 28, 2009

A Nightmarish Fantasy

Please stand by...


Friday, October 10, 2008

Unintended Spam Consequences to Webapps

Some time ago I wrote a webapp that would convert long URLs into fixed-length ("tiny") ones. Yes there is tinyurl.com out there but I keep forgetting the short URLs.

Thus it was easier for me to write it from scratch and be able to look at the database backend whenever I wish. The application has a verification code embedded in an scrambled image so I am sure that only humans can generate the short URLs.

I made the mistake to post its location on my home page (which is being prowled by Google and other bots).

It just happens that some scammers/spammers have used it today to spam people: they made a redirection to an image containing an ad for counterfeit watches. Now I get lots of hits from people who read their e-mails (and I see scores of webmail providers being hit by this spam).

Naturally I changed the stored links so people now get a logo of a US law enforcement agency and when they click the hook & bait link in their e-mails they end up at the website of the said agency.

It pisses me off that I get one hit every other second and my web server is getting a high load average as I installed the Perl webapp as CGI scripts instead of using mod_perl.

This will eventually force me to upgrade my ancient system (both as hardware and OS version) and put mod_perl in place.

-ulianov

Saturday, August 30, 2008

Log messages to dmesg/kernel buffer from user-space

From time to time a buddy of mine who's not a professional software developer asks questions that are interesting albeit they go against the grain of UN*X programming.

This time he asked whether printk is accessible to user-mode applications. As per entry.S it does not but the guy is stubborn so I got bored explaining the impossibility I got to work and wrote a Linux kernel module that allows him to fulfill his fantasy.

The technical article is here: http://spamchk.no-ip.org/uprintk.html. The source code is here.

-ulianov

Friday, August 15, 2008

Fooling Around with jiffies

Some time ago a QA buddy asked whether the Linux uptime can be changed on the fly in order to study the behaviour of a closed-source application that would only exhibit a certain bug after two weeks of running.

I looked at sys_gettimeofday (Linux kernel 2.4) and learned that it depends on jiffies. So I wrote a small module that adjusts jiffies at will.

The technical article is here: http://spamchk.no-ip.org/kernuptime.html. The source code is here.

-ulianov

Thursday, July 3, 2008

Go easy on "rm -fr"

Here I am trying to improve and extend an old script that read:
#!/bin/bash

[ ! -d ~/stage ] && exit 1

(cd ~/stage; rm -vfr *)

...
The new version read:
#!/bin/bash

STAGE=$HOME/stage

[ ! -d "$STAGE" ] && exit 1

(cd $STAGE; rm -vfr *)

...
Guess what? I managed to wipe out a good chunk of my $HOME!!

The explanation is that (cd $STAGE; rm -vfr *) executes in a new (child) copy of the shell [I wrote the code like this so I don't have to cd back after the removal].

The STAGE environment variable is set in the parent shell but is not inherited by the child shell because I did not export it so what got executed was in fact (cd; rm -vfr *) [just saying "cd" in bash gets you back to your $HOME].

The proper way to do this was:
#!/bin/bash

STAGE=$HOME/stage; export STAGE;

[ ! -d "$STAGE" ] && exit 1

(cd $STAGE; rm -vf *)

...
Note that I added the export STAGE statement and dropped the "-r" from the remove command as I really had no sub-directories in ~/stage

-ulianov

Tuesday, June 24, 2008

Cargo-Cult Programming

Here is an example I encountered of a chronic Cargo-Cult Programming affliction in a Junior engineer -- all code he wrote read:
if(CONSTANT == var) {
// something
} else {
// something else
}
The reason behind this is a fake "defensive programming" strategy as he protected himself against
if(var = CONSTANT) { ... }
which is does not do a comparison, it does an assignment which sometimes fails (when CONSTANT=0).

This is just bad coding. First, it's an eye-sore, second this programmer is not aware that new GCC iterations warn about this when using -Wall and even halt compilation when used in conjunction with -Werror

The fact that he was very stubborn did not help either.

-ulianov

Monday, June 23, 2008

The Forgotten Junk in libc

libc is a patchwork of stuff: syscalls and helper functions that got set in stone aeons ago, warts and all. During the bad times unthinking people stuck in it junk such as:
  • atoi() -- does not return errors;
  • gets() -- has no way of knowing the length of the buffer it populates;
  • strcpy() -- the all time favourite way of causing a crash: if the src pointer is junk and has no '\0' then the buffer pointed by dst will be filled beyond its boundaries with garbage;
  • ditto strcat();
  • sprintf() -- what if the stuff you wish to print into the buffer exceeds the buffer's length?
  • heck, even strlen(NULL) will crash;
  • strtok() -- this one is just evil.
DO NOT USE THESE CALLS! I cannot count how many time I had to look for crashes and refactor code using this garbage.

-ulianov

Friday, June 20, 2008

Why atoi() is NOT Your Friend

atoi() is a hold-over of the bad times when people put junk such as strcpy() and gets() into libc.

It is unsuitable as it has no way of returning and error, e.g.
atoi("adhgsjgdas") = 0
Alas many people still use it.

Please, please use instead:
if(sscanf(str, "%d", &int_var) != 1) {
// handle the error!!!
}
-ulianov

Thursday, June 19, 2008

Passing Params thru Registers Woes on x86

One year ago I was asked to look into a problem some colleagues had with a networking framework that lived as a set of Linux kernel modules.

The problem they saw was that when certain framework functions were called a parameter [which was a pointer] contained garbage.

They had the hairy idea to start using "-mregparm=3" to compile the kernel altho until them we lived happily with stack-based calls.

I looked at the code and the makefile and here is how gcc was invoked:
gcc -ggdb -O2 -pipe \
-mregparm=3 \
-Wall -Wstrict-prototypes -Wno-trigraphs \
-fno-strict-aliasing -fno-common \
-fomit-frame-pointer \
-mpreferred-stack-boundary=2 \
-march=pentium code.c
and here is how the offending code looked like [not a verbatim copy]:
#include <stdio.h>
#include <stdlib.h>
#ifdef FORCE_STDARG
#include <stdarg.h>
#endif

#define ENOMEM -2
#define EBADSLT -3
#define ENODEV 0

typedef int (*func_t)(void*, ...);

struct _S2; // forward declaration
struct _S1 {
char filler1[3];
func_t f1;
int (*f2)(long, long, struct _S2*, long);
};
struct _S2 {
char filler1[13];
long filler2;
struct _S1* s;
char filler3[7];
};

struct _S1 g_S1;
struct _S2 g_S2;

#ifdef FORCE_STDARG
int f1(struct _S1* s, ...)
#else
int f1(struct _S1* s)
#endif
{
#ifdef FORCE_STDARG
va_list ap;
va_start(ap, s);
#endif
if(s != &g_S1)
return -1;
#ifdef FORCE_STDARG
va_end(ap);
#endif
return 1;
}
int f2(long i1, long i2, struct _S2* s, long i3)
{
if(s != &g_S2)
return -1;
return 1;
}
int main()
{
g_S1.filler1[0] = 'A';
g_S1.f1 = (func_t)f1;
g_S1.f2 = f2;

g_S2.filler2 = 13;
g_S2.s = &g_S1;

if(g_S1.f1(&g_S1) < 0)
return -ENOMEM;
if(g_S2.s->f2(1, 2, &g_S2, 3) < 0)
return -EBADSLT;

return -ENODEV;
}
I noticed comparing disassembled code (objdump -dSl code.o) that the call setup for
g_S1.f1(&g_S1);
was the same regardless of "-mregparm" -- this is because the compiler applies the template
typedef int (*func_t)(void*, ...);
which forces it to put the arguments on the stack.

However the declaration
int f1(struct _S1* s);
in connection with "-mregparm=3" has f1() looking for its first argument in %eax which contains some garbage!!

Hint: compile the code with -DFORCE_STDARG and without and see the difference in execution.

The moral is two-fold:
1. if you use "..." in a function prototype then also use it in the function implementation!!;
2. in the kernel passing function arguments thru registers will yield little or no gain in execution speed (as only the leaf functions will fully benefit from the stack-free operation).

-ulianov

Tuesday, June 17, 2008

Bad Idiom: Using Strings on Stack

Here's what I've seen recently:
#include <stdexcept>

Class::Class(int arg)
{
...
if(/* error */) {
// original code
//throw std::runtime_error("Class::Class: Wrong argument!");

// later addition
char str[100];
sprintf(str, "Class::Class: argument %d is too big", arg);
throw std::runtime_error(str);
}
...
}
int main()
{
try { c = new Class(100000); } // say it's allocating memory
catch(std::runtime_error& e) {
printf("Got an error: %s\n", e.what());
}
}
The code sins in many ways:
1. the catch() is wrong as the code does not throw a reference but an object!
2. the throw() is poisoned: it throws a string from the stack which will go out of scope when the constructor is exited; the application won't crash immediately [the stack pages are not unmapped];
3. text of the exception will be junk (and may not have a \0 terminator) and attempting to print it may crash the app in funny and random ways.

The person who butchered the code and used the class hierarchy suffers from the cargo cult programming syndrome because:
1. did not pay attention to original code;
2. did not bother to read the Doxygen documentation of the original classes;
3. does not understand the life cycle of an auto var allocated on stack;
4. does not analyse all the implications of using and changing other people's code.

-ulianov

Monday, June 16, 2008

Compressed Output from Perl CGI Scripts

You know how you say in PHP
<?php ob_start("ob_gzhandler"); ?>
I wanted the same for a Perl CGI script so I dug a bit on the Net and here's the code I came up with:
use strict;
use Compress::Zlib qw(gzopen);

print <<EOF;
Content-Type: text/html
Content-Encoding: gzip

EOF

my $print;
binmode STDOUT;
{ # closure
my $gz = gzopen(\*STDOUT, "wb");
$print = sub { $gz->gzwrite(@_) };
}

$print->("Some content....");
Mind you I am not using CGI.pm as it's a memory hog and in this particular script I did not have to parse form variables.

On another note importing just what you need from a Perl module [i.e. qw(gzopen)] is a great way to cut down on memory consumption.

-ulianov

Saturday, June 14, 2008

Smashed Stack in a Multithreaded Application

One of the most annoying thing in debugging an app is to have it crash and GDB show garbage for the topmost 3-4 frames.

There is not much to do other than to do a diff between your current code (that is crap!) and the last known working version that you have in revision control (I hope you do have that!).

-ulianov

Friday, June 13, 2008

The Fine Distinction Between a Pointer and an Array

I keep finding crashes cause by people not comprehending what C pointers and arrays really are.

Definitions:
1. array = some auto variable that was declared as a collection of objects:
char x[100];
be it as a global or even worse on stack;
2. pointer = something that has been malloc'ed or something that holds the address of another object:
char* x1 = (char*)malloc(100);
or
char c;
char* x2 = &c;
A pointer may live as a global or on stack, regardless of this its memory footprint is 4 bytes (on a 32-bit architecture).

To many people pointers are arrays and arrays are pointers. True, but not all the time.

To simplify the discussion let us assume that a segfault occurs when we go beyond the boundaries of a buffer regardless of access type (read/write), its size and alignment.

Let's assume you populated x and x1 with a 99-byte message and you say:
write(STDOUT_FILENO, &x, 99); // mind the \0!
This is correct.

If you say:
write(STDOUT_FILENO, &x[0], 99); // mind the \0!
This is also correct.

If you say:
write(STDOUT_FILENO, x1, 99); // mind the \0!
This is correct.

If you say:
write(STDOUT_FILENO, &x1, 99); // mind the \0!
This is a SEGFAULT because you will try to access (99-4) bytes on the stack that follow x1 and there is nothing mapped there and you program will crash and burn!

The explanation is that
x == &x == &x[0]
but
x1 != &x1
as x1 means the allocated memory buffer to which x1 points to whereas &x1 means the address in memory where the pointer x1 lives!!!

In real life it gets even worse if x1 lives on the stack and you write to &x1 -- here you put garbage on stack and you may go past the red page. If there are some other auto vars on stack after it and they happen to be pointers then you have a recipe for a clusterfuck.

If x1 is a global then you will corrupt other globals that live after x1. This is even more fun!

-ulianov

Unlock of Mutex Failed

This was the one of the most puzzling problems I faced. The Teja application we had was failing once in a blue moon printing the message "unlock of mutex failed".

Looking at he code I saw that pthread_unlock_mutex returned an error which should never happen.

I spent a month writing a wrapper for the pthread/mutex library (a mutex is in this particular case a 64-byte memory area, futex was not called).

I put signatures before and after the mutex, checked signatures & various mutex fields before entering the true mutex call to no avail.

It was a random memory corruption that I solved elsewhere so this error ceased to happen.

-ulianov

Thursday, June 12, 2008

Re-Inventing the Wheel is Evil

Once I had to debug a program that ran in the Teja application framework for IXP 1200. We experienced random crashes at sites that were using a Walled Garden wireless authentication method.

By another name the Captive Portal consist in a TCP stream hijacker. The intercepted stream is HTTP. The component that handles this functionality simply pretends to be any HTTP destination, analyses the request and sends a crafted HTTP/307 redirection message in which the original URL is encoded as a GET parameter.

The Teja module that implemented the redirection got as an input an Ethernet frame (TCP reassembly was not performed but that was not a practical problem as most people simply go to http://yahoo.com, i.e. use short URLs that fit within one frame).

The data structure that enveloped the Ethernet frame was called Packet and it looked like:

struct Packet {
char buffer[1500];
short start;
short end;
};
To an astute reader this resembles an skbuff.

The code was parsing the input request and sending back a 307 that looked like "Location: http://auth.host/login.php?dest=OriginalURL"

The response was kept in a 6000-byte buffer which was large enough but here's what they were doing:

memcpy(packet.buffer+packet.end, response, response_len);
packet.end += response_len;
See anything wrong? If response_len>1500 then the memcpy thrashes packet.end, then one adds something to it resulting in a bigger junk. How about the junk in packet.start?

The Packets were malloc'ed [thus living in a pool of objects and having neighbours] so this had the potential (if len>1508) to corrupt the "secret stuff" the GNU allocator puts before each malloc'ed pointer. This will cause a happy crash when you want to free that pointer!

Later on the packet was copied into a hardware buffer that lived in IXP DRAM mapped in the Arm CPU address space like so:

memcpy(pHwBuf->buffer, \
packet.buffer+packet.start, \
(packet.end-packet.start));
Here you could copy garbage from a garbage address (packet.buffer+packet.start) [no crash if the memory was mapped] that had a garbage length (packet.end-packet.start) thus thrashing memory left and right in unexpected places.

Murphy's Laws dictate that the memory belonged to other threads of execution and that these threads will crash at another time in a such a way to turn you green.

In order to track that I had to change the Packet like so:

struct Packet {
unsigned sig1 = 0xdeadbeef; // initialised when
// allocating a Packet
char buffer[1500];
unsigned sig2 = 0xfeedabba;
short start;
short end;
unsigned sig3 = 0xdeafabba;
};
and check the signatures after each Packet operation. Took me only a month and one very pissed customer to fix this (not my code).

-ulianov

P.S. The Linux skbuff infrastructure comes with a lot of accessory functions that prevent this kind of crap from happening by Oops'ing.