September 2011 Archives

Exit statuses and how $? works

The other day I was wondering about $? in Perl and the shell and how exit statuses work. I did some digging and now I'd like to talk a bit about exit statuses at the OS level and how Perl and the shell deal with it.

First off, there are two ways a unix process can terminate. One is by calling _exit, the other is getting killed by a signal. In both cases the resources of the process (such as memory or file descriptors) are cleaned up. All that remains is an entry in the process table (i.e. the PID is still taken) and some status information on how the process died. This "stale" process table entry is called a zombie process.

After a process has terminated, it's its parent's job to clean up after it by calling wait or simply exiting itself (a process whose parent has died is known as an orphan process; it will be adopted and cleaned up by init, the process with id 1).

In C, a successful call to wait gives you a status value of type int. The header <sys/wait.h> provides several macros to examine this status, such as:

  • WIFEXITED - true if the process died by calling _exit
  • WEXITSTATUS - the number passed to _exit (only valid if WIFEXITED is true)
  • WIFSIGNALED - true if the process was killed by a signal
  • WTERMSIG - the signal that killed it (only valid if WIFSIGNALED is true)

Something that may not be obvious: the "exit status" consists of a single byte, so only the 8 lowest bits of the number you pass to _exit will be used. In other words, WEXITSTATUS will always be in the range [0, 255].

The above interface is fairly abstract; it doesn't define the details of how signal numbers and exit statuses are encoded. But there is a traditional way unix has done this, and it's also how Perl does it.

  • The status value in $? is a 16-bit number.
  • The low 8 bits are set if the process died from a signal.
  • The low 7 bits are the signal number; bit 8 is set if the process dumped core.
  • Otherwise the exit status is stored in the high 8 bits.

Perl always computes $? to look like this, even if your OS doesn't encode status information this way natively. This means a C program may get different status bits from wait than what a Perl program would see in $?. If you really need your OS's native status code in a Perl program, you can use ${^CHILD_ERROR_NATIVE} and the W* functions from POSIX.

(The following information was taken from the bash manual but it probably applies to all Bourne-style shells.)

Finally there is another $? variable in the shell. It has the same name and function as Perl's, but its values work differently. In particular, it's always an 8-bit number, not 16 bits as in Perl.

  • If a process exits normally, its status is in $?.
  • If it is killed by a signal, $? is set to 128 + $signo (where $signo is the signal's number).

Example: On my system SIGSEGV is 11, so after a segfault the shell sets $? to 128 + 11 = 139.

Other synthetic $? values include 127 (command not found) and 126 (command found but it wasn't executable).

Why you probably don't want to use claws-mail

I've been using claws-mail as my email client for a while. Today I wanted to search all of my messages for some text and look through the results. When I couldn't figure out how to do that even after consulting the claws manual, I went to their official IRC channel.

For reference, here are some screenshots of the dialogs in question: claws-esearch-information-mtsucks.pngclaws-esearch-edit.png

This is what happened:

17:55 -!- .join mauke!~mauke@p3m/member/mauke #claws
17:55 -!- Irssi: #claws: Total of 33 nicks [8 ops, 0 halfops, 0 voices, 25 normal]
17:55 -!- Irssi: Join to #claws was synced in 1 secs
17:55 <mauke> hello, I am raging at claws-mail
18:08 <mauke> basically, the search feature is undocumented and I
              can't figure out how it works
18:15 <mohaa> "undocumented" ?
18:16 <mauke> yeah, like:  When in "Extended" mode, the "Information"
              button is visible, enabling you to see the search
              syntax.
18:17 <mauke> doesn't "enable" shit
18:17 <mauke> "Run on select" isn't even mentioned in the manual
18:17 <mauke> what is 'S' in the Information window?
18:18 <mauke> after I run a search, where are the results?
18:18 <mauke> what exactly does 'E' do and how is it useful?
18:19 <mauke> what is the unit of '#' in 'ag'?
18:19 <mauke> how is % supposed to be used?
18:19 <mauke> "all filtering expressions are allowed" - what are
              "filtering expressions" and what does "allowed" mean
              here?
18:20 <mauke> how is E different from X?
18:20 <mauke> are parens allowed? if not, how do expressions work?
18:21 <mauke> what's the unit of # in Se/Sg/Ss?
18:21 <mauke> why does the description of 'tg' ("messages which tags
              contain S") randomly use "which" instead of "whose"?
18:22 <mauke> how does 'k' work?
18:22 <mauke> are the colors numbers?
18:24 <mohaa> "parens" ?
18:24 <mauke> yes
18:25 -!- .join krayon!~fallen@pdpc/supporter/28for7/krayon #claws
18:25 <mauke> I mean, how else does grouping work?
18:47 -!- .join andyrtr_laptop!~andyrtr@f053200240.adsl.alicedsl.de #claws
19:12 -!- .quit andyrtr!~andyrtr@archlinux/developer/andyrtr [Quit: Ex-Chat]
19:22 -!- .join Columbo0815!~foobar@HSI-KBW-085-216-116-089.hsi.kabelbw.de #claws
19:24 -!- .join claws_!~claws@5ad1377a.bb.sky.com #claws
19:24 -!- .mode clunker3 #claws [+o claws_]
19:25 -!- .nick claws_ -> claws
19:42 -!- .quit Columbo0815!~foobar@HSI-KBW-085-216-116-089.hsi.kabelbw.de [Quit: Verlassend]
19:43 <@claws> mauke: you're funny :)
19:44 <mauke> why?
19:45 <@claws> maybe you wouldn't rage so much if you'd never found
               the feature or the feature didn't exist
19:45 <mauke> no, my main problem is the lack of a good search feature
19:45 <mauke> that's how this all started
19:45 <@claws> anyway, ignore the information dialog and click the
               edit button - this is probably easier to digest
19:46 <mauke> no, the edit button seems to lack all features
19:46 <@claws> the quick search is a _very good_ search feature imo
19:46 <@claws> no, the edit button does not
19:46 <mauke> claws: yes, but I can't figure out how to use it
19:46 <mauke> see above
19:46 <@claws> have you worked out how to use Filtering in Claws?
19:46 <mauke> no
19:46 <@claws> heh, thought not
19:47 <mauke> btw, I blanked some spam pages in the wiki
19:47 <@claws> heh, yes. there are always new spam pages in the wiki
19:47 <@claws> thanks. saved me a job
19:48 <@claws> back to the quick search, what are you likely to want
               to search for?
19:48 <mauke> text
19:48 <mauke> I want a full text search over everything
19:48 <@claws> b 'my search term' in Extended mode
19:48 <@claws> b is for body
19:48 <mauke> what happens if I omit the quotes?
19:49 <mauke> what happens with double quotes?
19:49 <mauke> how do I search for '?
19:49 <@claws> the quickest way to find your answer is to try it
19:50 <mauke> I did, results make no sense
19:50 <@claws> is that possible, perhaps?
19:50 <@claws> made no sense? in what way?
19:50 <mauke> why can't I just read a reference manual?
19:50 <@claws> dunno ...because you wouldn't??
19:50 <mauke> well, first there's problem #1: <mauke> after I run a
              search, where are the results?
19:51 <@claws> because you haven't written it yet?
19:51 <@claws> the results are in the msg list - this list is
               shortened, based on the results
19:51 <mauke> what msg list?
19:51 <@claws> that's quite obvious, i think, no???
19:51 <mauke> I have a nested folder structure
19:51 <@claws> now, m[e]s[sa]ge list _is_ in the manual!!
19:52 <@claws> the message list is not the folder list
19:52 <mauke> message list is the contents of one folder
19:52 <mauke> this is useless
19:52 <mauke> I don't want to manually click through all folders just
              to see if one of them maybe contains some search results
19:53 <@claws> then search in the top-level folder and use a recursive
               search
19:53 <mauke> I did
19:53 <@claws> that's not useless
19:53 <mauke> where are my results?
19:54 <@claws> a folder which has results has its folder icon replaced
               with a magnifying glass. select that folder and you'll
               see your results
19:54 <mauke> so I still have to manually click through the folders?
19:54 <mauke> blargh
19:54 <mauke> who came up with this UI
19:54 <@claws> why do you ask?
19:55 <mauke> that was more of a rhetorical question
19:55 <@claws> more like a rude question, methinks
19:55 <mauke> the search seems to ignore everything after the first
              word
19:55 <@claws> then quote
19:55 <mauke> is that correct?
19:56 <@claws> doesn't a test answer your question?
19:56 <mauke> no
19:56 <mauke> I can't rule out that it has some effect I didn't notice
19:57 <mauke> hence "seems to ignore"
19:57 <@claws> is it me, but you seem to have an attitude??
19:58 <mauke> I do, but not where you think
19:58 <mauke> this confuses me a bit
19:59 <@claws> you do, but not where i think?? that confuses me
19:59 <mauke> I've been trying to be nice and civil at the end there
19:59 <@claws> maybe the beginning would have been better, but
               whatever!
19:59 <@claws> the quick search is good, it works. maybe it doesn't
               work how you want it to work, but there it is
20:00 <mauke> my main problem is not that it doesn't work how I want
              it to, it's that it's not documented
20:00 <mauke> so I don't even know if it does what I want
20:01 <mauke> the list at the beginning is the "obvious" questions I
              had when I saw the [Information] window
20:01 <mauke> the [Edit] dialog doesn't seem to allow a full text
              search
20:02 <mauke> and I didn't notice the folder icon change at all
20:02 <mauke> what I find weird is that you can't give me a straight
              answer as to what "b x y z" does
20:03 <mauke> normally I'd expect "RTFM" in response to that but there
              doesn't seem to be a FM
20:03 <mauke> you said I should try it; I did. now I'm simply asking
              for confirmation
20:04 <mauke> and all I get is "you seem to have an attitude"
20:08 <@claws> in the edit dialogue use phrase --> in body part
20:08 <mauke> oh wow
20:09 <@claws> maybe that's "all" you get because that's "all" you
               gave?? :)
20:09 <mauke> I didn't even think to look there because Header, Age,
              Flags, ... are other top-level categories
20:09 <@claws> in this case it's not RTFM, it's WTFM :)
20:10 <mauke> so I in analogy to Header I was looking for Body or
              Message or whatever
20:10 <mauke> yeah, Phrase is the odd one out
20:10 <mauke> the other items specify which part/attribute of the
              message to search *in*; Phrase describes what kind of
              thing you're looking *for*
20:11 <mauke> claws: dude, I asked you a simple yes/no question
20:11 <mauke> you could answer 1) yes, 2) no, 3) rtfm at ...,
              4) you're stupid because ...
20:12 <@claws> but none of those fitted my answer
20:12 <mauke> but this "I don't like your attitude" bullshit is
              extremely annoying
20:12 <@claws> yep, i thought the same
20:12 <mauke> so why are you doing it?
20:13 <@claws> i'm not any more, i'm going to do something else
20:13 <mauke> sorry for assuming this program was meant to have users

Update 1:

  • Changed the title from "shouldn't" to "don't want to" because that's closer to what I meant.
  • Reformatted the log a bit to prevent the right side being cut off.

Update 2:

Oh wow, it keeps getting better. I really wanted to know how this #!$@ actually works so I downloaded the source code. It comes with a fairly long README that (among other things) explains Quick Search better than both the "user manual" and the built-in help system. It even has examples! First the filter engine syntax:

    from regexpcase "foo"
    subject regexp "Bug" & to regexp "claws-mail"

Then the extended patterns:

    # means number
    S means regexp string
    f "john beavis"    messages from john beavis
    %f "John Beavis"   messages from John Beavis (case sensitive)
    ~s foo             messages which do not have foo in the subject
    f foo & ~s bar     messages from foo that do not have bar in thesubject

Still missing:

  • What units are the # numbers in?
  • What's the syntax for a "regexp string"?
  • What regexp flavor?

But it does show that you can use double quotes and that the % should be placed immediately in front of one of the string operators.

(BTW, "S means regexp string" is the only occurrence of "regex" or "regul.*ex" in README, doc/, or manual/.)

...

Now I've looked at the source code. Turns out claws-mail uses a rewriting system to convert "extended patterns" into "filter syntax" internally. The actual syntax is a bit complicated:

  • Tokens in the search expression are space separated.
  • Every term in the expression starts with an (optionally prefixed) command.
  • The prefixes are !, ~ and %.
  • You can have at most one of ! and ~ (both mean "not"). If you have one, it must appear first.
  • You can also have % (it means "case sensitive").
  • If prefixes are present, they must appear in the above order with no whitespace in between (example: !%f).
  • Then comes the command code.
  • Then the argument (if the command takes one).
  • If the argument starts with a " (double quote), it ends at the next ". There is no way to embed a " in the string. Otherwise it ends at the next ' ' (space, ASCII 32) character (not the next whitespace character!).
  • As far as the rewriter is concerned, & and | are just commands without arguments, not operators between commands. That means there's no way to nest them (it also means ~& is accepted).

Regarding regex syntax:

  • It uses regcomp from regex.h with the REG_EXTENDED flag, so the flavor is POSIX Extended.

Regarding numeric units:

  • Colors are something terrible (a message's "color value" is an integer whose parts are stored in different places; requires bitwise operations to reassemble). I'm just going to assume that colors are numbered according to their appearance in the Message > Color label submenu.
  • Age is internally stored with a resolution of one second, but the search operators work in whole days.
  • Size is the raw message size in bytes.

Conclusion:

If you're an end user, claws-mail may not be for you. Source diving seems to be the best way to get accurate and complete information on how the program works.

About mauke

user-pic I blog about Perl.