Converting glob patterns to regular expressions

Let's say you have a glob pattern with shell-style wildcards from a config file or user input, where ? matches any character and * matches any string (0 or more characters). You want to convert it to a regex, maybe because you just want to match it (and Perl already supports regexes) or because you want to embed it as part of a bigger regex.

You might start with a naive replacement:

s/\?/./g;   # ? -> .
s/\*/.*/g;  # * -> .*

Unfortunately this is broken: It leaves all other characters untouched, including those that have a special meaning in regexes, such as (, +, |, etc.

Let's revise it:

s{(\W)}{
    $1 eq '?' ? '.' :
    $1 eq '*' ? '.*' :
    '\\' . $1
}eg;

Now we match and replace every non-word character. If it's ? or *, we turn it into its regex equivalent; otherwise we backslash-escape it just like quotemeta would do.

But what if the input is something like a***b? This would turn into a.*.*.*b, which when run on a long target string without bs by a backtracking engine can be very inefficient (barring extra optimizations). A missing b would make the match fail at the end, which would cause the engine to go through all possible ways .*.*.* could subdivide the string amongst themselves before giving up. In general this takes O(nk) time (where n is the length of the target string and k is the number of stars in the pattern).

We can do better than that by realizing ** is equivalent to *, which means that any sequence of stars is equivalent to a single *, and preprocessing the pattern:

tr,*,,s;  # ***...* -> *

This still doesn't fix everything, though: *?*?* doesn't contain any repeated *s but still allows for exponential backtracking. One way to work around this is to normalize the pattern even further: Because *? is equivalent to ?*, we can move all the ?s to the front:

# "*?*?*"
1 while s/\*\?/?*/g;
# "?*?**"  (after 1 iteration)
# "??***"  (after 2 iterations)
tr,*,,s;
# "??*"
s{(\W)}{
    $1 eq '?' ? '.' :
    $1 eq '*' ? '.*' :
    '\\' . $1
}eg;
# "...*"

However, I don't like that the transformation is spread out over two regex substitutions and one transliteration, when there is a way to do it all in a single substitution:

s{
    ( [?*]+ )  # a run of ? or * characters
|
    (\W)       # any other non-word character
}{
    defined $1
        ? '.{' . ($1 =~ tr,?,,) . (index($1, '*') >= 0 ? ',' : '') . '}'
        : '\\' . $2
}xeg;

That is, we turn each run of ? or * characters into .{N} (if there was no *) or .{N,} (if there was at least one *) where N is the number of ?s in the run.

Given an input of *?*?*, this would generate .{2,} ("match 2 or more of any character").

And finally, if we wanted the user to be able to escape characters with a backslash to match them literally:

s{
    ( [?*]+ )  # a run of ? or * characters
|
    \\ (.)     # backslash escape
|
    (\W)       # any other non-word character
}{
    defined $1
        ? '.{' . ($1 =~ tr,?,,) . (index($1, '*') >= 0 ? ',' : '') . '}'
        : quotemeta $+
}xeg;

Fun with logical expressions 2: Electric boogaloo

tybalt89 discovered a bug in Fun with logical expressions:

$ echo IEabEba | perl try.pl
ORIG: IEabEba
MOD: V~V~V~a~b;~V~~a~~b;;V~V~b~a;~V~~b~~a;;;
...> V~V~V~a~b;~Vab;;V~V~b~a;~Vba;;;
...> V~V~V~a~b;~Vab;;~V~b~a;~Vba;;
Not a tautology

(a equals b) implies (b equals a) is a tautology but the program fails to recognize it. With the fixed code below it generates this output instead:

$ echo IEabEba | perl taut.pl
ORIG: IEabEba
MOD: V~V~V~a~b;~V~~a~~b;;V~V~b~a;~V~~b~~a;;;
...> V~V~V~a~b;~Vab;;V~V~b~a;~Vba;;;
...> V~V~V~a~b;~Vab;;~V~b~a;~Vba;;
...> V~Vba;~V~V~a~b;~Vab;;~V~b~a;;
...> V~Vab;~V~Vab;~V~a~b;;~V~a~b;;
...> V~Vab;~V0~V~a~b;;~V~a~b;;
...> V~Vab;~V~V~a~b;;~V~a~b;;
...> V~Vab;~~V~a~b;~V~a~b;;
...> V~Vab;V~a~b;~V~a~b;;
...> V~Vab;~a~b~V~a~b;;
...> V~Vab;~a~b~V0~b;;
...> V~Vab;~a~b~V~b;;
...> V~Vab;~a~b~~b;
...> V~Vab;~a~bb;
...> V~V1b;~a~bb;
...> V~1~a~bb;
...> V0~a~bb;
...> V~a~bb;
...> V~a~b1;
...> 1
Tautology

The following changes were made to the code:

  1. The "regex libraries" $rawlib and $modlib and their named subpatterns (?&rawexpr) and (?&exp) are gone. They were replaced by $rawexpr and $exp, subregexes that directly match and capture a simplifed and modified expression, respectively.

    This change was made in order to make it possible to split a string into subexpressions using my @expr = $str =~ /$exp/g (i.e. m//g in list context).

  2. Two regexes were simplified by using \K.

  3. The old "final rule" did some duplicate work: It used index($1, $3) < 0 && index($4, $3) < 0 inside the regex to search for the subexpression $3 in both $1 and $4. If successful, it repeated the search in the replacement part: s{ \Q$x\E }{$spec}g for $pre, $post;

    The new version uses s/// directly in the regex and checks the return value to see if any match was found/replaced. The replacement string is assembled directly in the regex and saved in an outer lexical variable to make it available in the right-hand side.

  4. A new rule was added. If the other rules get stuck, it reorders V operands. The canonical order chosen is simply the default sort behavior: lexicographically ascending strings.

Due to change #3 and #4 the code no longer works on old perls (before v5.18) because in v5.18 the implementation of embedded code blocks in regexes was rewritten, fixing many bugs.

The new code:

#!/usr/bin/env perl
use v5.18.0;
use warnings;
use re '/xms';

my $rawexpr = qr{
    (
        (?>
            [a-z]
        |
            N (?-1)
        |
            [CDIE] (?-1) (?-1)
        )
    )
};

my $exp = qr{
    (
        (?>
            [01a-z]
        |
            ~ (?-1)
        |
            V (?-1)*+ ;
        )
    )
};

while (readline) {
    chomp;
    say "ORIG: $_";

    1 while s{ E $rawexpr $rawexpr }{DC$1$2CN$1N$2}g;
    1 while s{ I $rawexpr $rawexpr }{DN$1$2}g;
    1 while s{ C $rawexpr $rawexpr }{NDN$1N$2}g;
    1 while s{ D $rawexpr $rawexpr }{V$1$2;}g;
    tr/N/~/;

    say "MOD: $_";

    say "...> $_" while 0
        || s{ ~ ~ }{}g
        || s{ ~ 0 }{1}g
        || s{ ~ 1 }{0}g
        || s{ V ; }{0}g
        || s{ V $exp ; }{$1}g
        || s{ V $exp* \K V ($exp*+) ; }{$2}g
        || s{ V $exp* \K 0 }{}g
        || s{ V $exp* 1 $exp*+ ; }{1}g
        || do {
            my $repl;
            s{
                V ($exp*?) (~??) $exp ($exp*+) ;
                (?(?{
                    my ($pre, $neg, $x, $post) = ($1, $3, $4, $5);
                    my $spec = $neg ? '1' : '0';
                    my $n = 0;
                    $n += s{ \Q$x\E }{$spec}g for $pre, $post;
                    $repl = "V$pre$neg$x$post;";
                    !$n
                }) (*FAIL) )
            }{$repl}g
        }
        || do {
            my $canon;
            s{
                V ($exp {2,}+) ;
                (?(?{
                    my $orig = $1;
                    $canon = join '', sort $orig =~ m{ $exp }g;
                    $orig eq $canon
                }) (*FAIL) )
            }{V$canon;}g
        }
    ;

    say $_ eq '1' ? "Tautology" : "Not a tautology", "\n";
}

__END__

Functional fun with logical expressions

Here's a quickly thrown-together version of "Fun with logical expressions", written in Haskell:

{-# LANGUAGE LambdaCase #-}

module Main (main) where
import Control.Applicative (Applicative (..), (<$>))
import Control.Arrow       (first)
import Data.Ix             (inRange)
import Control.Monad       (guard, mplus, msum)
import Data.Maybe          (fromMaybe, isJust, listToMaybe)

data RawExpr
    = VarR Char
    | N RawExpr
    | D RawExpr RawExpr
    | C RawExpr RawExpr
    | I RawExpr RawExpr
    | E RawExpr RawExpr
    deriving (Eq, Ord, Read, Show)

newtype Parser a = Parser{ unParser :: String -> (a, String) }

instance Functor Parser where
    fmap f (Parser p) = Parser (first f . p)

instance Applicative Parser where
    pure x = Parser (\s -> (x, s))
    Parser f <*> Parser x = Parser (\s ->
        let (f', s')  = f s
            (x', s'') = x s'
        in (f' x', s''))

instance Monad Parser where
    return = pure
    Parser x >>= f = Parser (\s ->
        let (x', s') = x s
        in unParser (f x') s')

anyChar :: Parser Char
anyChar = Parser (\(c : cs) -> (c, cs))

parseRaw :: String -> RawExpr
parseRaw s = case unParser expr s of
    (x, "")   -> x
    (_, rest) -> error ("parseRaw: trailing garbage: " ++ show rest)
    where
    expr = anyChar >>= \case
        c | inRange ('a', 'z') c -> pure (VarR c)
        'N' -> N <$> expr
        'D' -> D <$> expr <*> expr
        'C' -> C <$> expr <*> expr
        'I' -> I <$> expr <*> expr
        'E' -> E <$> expr <*> expr

data ModExpr
    = VarM Char
    | T
    | F
    | Not ModExpr
    | Or [ModExpr]
    deriving (Eq, Ord, Read, Show)

desugar :: RawExpr -> ModExpr
desugar = \case
    VarR v -> VarM v
    N x    -> not (desugar x)
    D x y  -> or  (desugar x) (desugar y)
    C x y  -> and (desugar x) (desugar y)
    I x y  -> or (not (desugar x)) (desugar y)
    E x y  -> or (and x' y') (and (not x') (not y'))
        where
        x' = desugar x
        y' = desugar y
    where
    not     = Not
    or  x y = Or [x, y]
    and x y = not (or (not x) (not y))

render :: ModExpr -> String
render e = go e "" where
    go = \case
        VarM c -> (c :)
        T      -> ('1' :)
        F      -> ('0' :)
        Not x  -> ('~' :) . go x
        Or xs  -> ('V' :) . foldr (.) id (map go xs) . (';' :)

type Rule = ModExpr -> Maybe ModExpr

mkRule :: (Maybe ModExpr -> Maybe ModExpr) -> Rule
mkRule t = rule
    where
    rule e = t (Just e) `mplus` rec
        where
        rec = case e of
            Not x -> Not <$> rule x
            Or xs -> do
                let ys = map rule xs
                guard (any isJust ys)
                return (Or (zipWith fromMaybe xs ys))
            _ -> Nothing

not_not, not_true, not_false :: Rule
not_not = mkRule $ \e -> do
    Not (Not x) <- e
    pure x
not_true = mkRule $ \e -> do
    Not T <- e
    pure F
not_false = mkRule $ \e -> do
    Not F <- e
    pure T

or_true, or_false :: Rule
or_true = mkRule $ \e -> do
    Or xs <- e
    guard (T `elem` xs)
    pure T
or_false = mkRule $ \e -> do
    Or xs <- e
    let ys = filter (F /=) xs
    guard (length ys < length xs)
    pure $ Or ys

first3 :: (a -> r) -> (a, b, c) -> (r, b, c)
first3 f (x, y, z) = (f x, y, z)

second3 :: (b -> r) -> (a, b, c) -> (a, r, c)
second3 f (x, y, z) = (x, f y, z)

selectSplit :: [a] -> [([a], a, [a])]
selectSplit []       = []
selectSplit (x : xs) = ([], x, xs) : map (first3 (x :)) (selectSplit xs)

extract :: (a -> Maybe b) -> [a] -> Maybe ([a], b, [a])
extract f xs = msum (map (evert . second3 f) (selectSplit xs))
    where
    evert (x, Just y, z) = Just (x, y, z)
    evert _              = Nothing

or_none, or_one, or_assoc :: Rule
or_none = mkRule $ \e -> do
    Or [] <- e
    pure F
or_one = mkRule $ \e -> do
    Or [x] <- e
    pure x
or_assoc = mkRule $ \e -> do
    Or xs <- e
    (pre, ys, post) <- extractOr xs
    pure $ Or (pre ++ ys ++ post)
    where
    extractOr = extract (\case Or xs -> Just xs; _ -> Nothing)

isSubExpr :: ModExpr -> ModExpr -> Bool
isSubExpr e1 e2 =
    e1 == e2 || case e2 of
        Not x -> e1 `isSubExpr` x
        Or xs -> any (e1 `isSubExpr`) xs
        _ -> False

substitute :: ModExpr -> ModExpr -> ModExpr -> ModExpr
substitute x y = go
    where
    go = \case
        e | e == x -> y
        Not e      -> Not (go e)
        Or es      -> Or (map go es)
        e          -> e

or_spec :: Rule
or_spec = mkRule $ \e -> do
    Or xs <- e
    listToMaybe $ do
        (pre, x_orig, post) <- selectSplit xs
        (Not x, r) <- [(x_orig, T), (Not x_orig, F)]
        guard $ any (x `isSubExpr`) (pre ++ post)
        let z = Or (map (substitute x r) pre ++ [x_orig] ++ map (substitute x r) post)
        pure z

rules :: [Rule]
rules =
    [
        not_not, not_true, not_false,
        or_true, or_false,
        or_none, or_one, or_assoc,
        or_spec
    ]

simplify1 :: ModExpr -> Maybe ModExpr
simplify1 e = msum (map ($ e) rules)

simplify :: ModExpr -> (ModExpr, [String])
simplify e = case simplify1 e of
    Nothing -> (e, [])
    Just e' -> (render e' :) <$> simplify e'

process :: String -> String
process s = unlines $
    ("ORIG: " ++ s) :
    ("MOD: " ++ render e) :
    foldr (\x z -> ("...> " ++ x) : z) [v, ""] ts
    where
    e = desugar (parseRaw s)
    (r, ts) = simplify e
    v | r == T    = "Tautology"
      | otherwise = "Not a tautology"

main :: IO ()
main = interact (unlines . map process . lines)

Fun with logical expressions

Addendum 2: This set of rules is incomplete. See Fun with logical expressions 2 for an improved version.

Addendum: I've written a more or less equivalent Haskell version of this program.

Some programmers, when confronted with a boolean expression, think "I know, I'll use regular expressions". Now they have true problems.

This post was inspired by the TAUT exercise on SPOJ.

You're given a boolean expression consisting of variables (each being either true or false), a unary operator (not), and several binary operators (and, or, implies, equals). Your task is to determine whether the expression is a tautology, i.e. whether it evaluates to true for all possible variable values.

To make parsing easier, we're going to use a very simplified syntax:

  • Variables are represented by lowercase Latin letters (a-z), i.e. an expression can use at most 26 different variables
  • Operators are represented by uppercase letters: N for not, C (conjunction) for and, D (disjunction) for or, I for implies, E for equals
  • Operators precede their operands: x and not y is written as CxNy, etc.

For example, the expression ((a implies b) and (b implies c)) implies (a implies c) becomes ICIabIbcIac.

There is a number of ways to solve this problem, e.g. using brute force to enumerate all possible variable assignments and test the formula, but we're going to use algebraic simplification to determine directly whether a given expression reduces to true. To make it more fun, we're going to implement it as a string rewriting system using Perl's regex engine.

In the following sections I'm going to take you through the code sections in an order that makes it easier to explain. The full runnable code is at the end.

#!/usr/bin/env perl
use v5.14.0;
use warnings;
use re '/xms', $^V lt v5.18.0 ? 'eval' : ();

The header: The #! line instructs the system to use whatever perl is in the path. use v5.14.0 requires at least version 14 of perl5 (any earlier version will abort parsing with an error at this point). This enables a few features new in version 14 and also turns on strict (since v5.12). use warnings enables warnings - almost always a good idea. use re '/xms' enables the xms flags by default for all regexes in the current scope (this feature was introduced in v5.14). Finally, for perls before v5.18, we use re 'eval' to work around a bug related to parsing of (?{ }) blocks in regexes.

while (readline) {
    chomp;
    say "ORIG: $_";

We iterate over the lines in ARGV (formed by the contents of the files listed in @ARGV or STDIN if @ARGV is empty). From each line (stored in $_) we remove the trailing newline and print it to STDOUT so we can see what we're working on.

Our next goal is to reduce the number of possible operators in the expression. We do that by rewriting everything in terms of N (not) and D (or).

The first step is getting rid of E: What does it mean for two expressions x, y to be equal? Well, either they're both true or they're both false, so the first part is simply x and y. If they're both false, their negations must be true, giving us not x and not y. Putting it all together, X equals Y can be rewritten as (X and Y) or (not X and not Y), which in the simplified syntax is Exy and DCxyCNxNy, respectively.

    1 while s{ E ((?&rawexpr)) ((?&rawexpr)) $rawlib }{DC$1$2CN$1N$2}g;

I'm going to explain the (?&rawexpr) and $rawlib bits in a moment. For now, it suffices to say that (?&rawexpr) is a regex that matches an expression in the simplified syntax and $rawlib contains the definition of (?&rawexpr).

We have a few spaces in our regex. These are just for readability and ignored because of the /x flag, which we enabled for all regexes in our main file.

We match E followed by two subexpressions, which we capture as $1 and $2. We replace it by DC$1$2CN$1N$2, which (as explained above) is equivalent. The /g flag makes s/// replace all occurrences, so what is the outer while loop for? It's for nested expressions of the form ExEyz: x would be matched as $1 and Eyz as $2, but we're not recursively replacing E inside of $1 or $2, so at the end we still have an instance of E left. Instead of trying to write some kind of nested recursive substitution, we just wrap the whole thing in a loop that tries again if any match/replacement happened in the previous iteration, until all the Es are gone.

    1 while s{ I ((?&rawexpr)) ((?&rawexpr)) $rawlib }{DN$1$2}g;

Next we get rid of I. x implies y is only false if x is true and y is false; all other cases are true. This corresponds to not x or y, which is DNxy in the simplified syntax.

    1 while s{ C ((?&rawexpr)) ((?&rawexpr)) $rawlib }{NDN$1N$2}g;

Now we get rid of C using De Morgan's law: not (x and y) = not x or not y. Adding a not on both sides gives us not (not (x and y)) = not (not x or not y). The two nots on the left side cancel out, giving x and y = not (not x or not y), which is NDNxNy in the simplified syntax.

Now for the definition of $rawlib, which makes use of regex features introduced in v5.10:

my $rawlib = qr{
    (?(DEFINE)
        (?<rawexpr>
            (?>
                [a-z]
            |
                N (?&rawexpr)
            |
                [CDIE] (?&rawexpr) (?&rawexpr)
            )
        )
    )
};

This says the whole regex is in a (?(DEFINE) ... ) block, which has no effect on matching. It merely provides definitions for use in the rest of the regex (i.e. $rawlib is effectively a regex library). Using (?<rawexpr> ... ) we define a named subpattern called rawexpr: A rawexpr is either a character from the range a-z, or N followed by a rawexpr, or one of CDIE followed by two rawexprs. The (?&rawexpr) syntax lets us invoke named subpatterns recursively.

By including $rawlib in the regexes above we made the name rawexpr visible in the surrounding regex, which allowed us to simply write (?&rawexpr) to match a non-trivial piece of the input.

At this point in the program we've reduced the input expression to nothing but combinations of variables, N, and D. To proceed further, we're going to switch from a 2-operand or (D) to an operator with a variable number of operands (V...;), i.e. we're going to turn Dxy into Vxy;:

    1 while s{ D ((?&rawexpr)) ((?&rawexpr)) $rawlib }{V$1$2;}g;

This line works because even though (?&rawexpr) doesn't match V...;, regexes always find the leftmost match first, and because we're using prefix notation, the first match will be the outermost expression. This means we're rewriting from the outside in, so any subexpressions are guaranteed to not contain V...; yet.

    tr/N/~/;

Finally (because I think it looks better) we turn all the Ns into ~s.

    say "MOD: $_";

At this point we've completely changed from the raw input form to something using variables, ~, and V...;. To confirm everything went well, we output the modified expression.

What follows are the rewrite rules that simplify the expression. We start with another regex library:

my $modlib = qr{
    (?(DEFINE)
        (?<exp>
            (?>
                [01a-z]
            |
                ~ (?&exp)
            |
                V (?&exp)*+ ;
            )
        )
    )
};

A (modified) expression exp is either 0 or 1 (constants representing false and true, respectively), or a variable from the set a-z, or ~ followed by an exp, or V followed by 0 or more exps, followed by ;.

Now the actual rewrite code:

    say "...> $_" while 0

We print the current expression after each substitution step to see what's going on. The 0 after while lets us start the next line with ||, which makes the substitutions look uniform. We repeatedly try to find matches of our rules until none of the regexes matches anymore. At that point we've finished rewriting (and hopefully simplifying) the expression as much as possible.

        || s{ ~ ~ }{}g

All instances of ~~ can be removed (not not cancels out).

        || s{ ~ 0 }{1}g

not false is true.

        || s{ ~ 1 }{0}g

not true is false.

        || s{ V ; }{0}g

or [] with no operands is false (because false is the neutral element for or).

        || s{ V ((?&exp)) ; $modlib }{$1}g

or [x] with a single operand is just that operand (x).

        || s{ V ((?&exp)*) V ((?&exp)*) ; $modlib }{V$1$2}g

or [@x, or [@y], @z] is the same as or [@x, @y, @z] (using @x to stand for "a list of expressions called x"). This is essentially an associativity rule that eliminates nested ors.

        || s{ V (?&exp)* 1 (?&exp)*+ ; $modlib }{1}g

or [@x, true, @y] is true; i.e. an or with true among its operands is itself true.

        || s{ V ((?&exp)*) 0 $modlib }{V$1}g

or [@x, false, @y] is or [@x, @y]; i.e. false can be eliminated if it's an operand of or.

The final rule is quite a mouthful so I'm going to provide inline commentary:

        || s{
            V ((?&exp)*?) (~??) ((?&exp)) ((?&exp)*+) ;

We match V, followed by 0 or more expressions, followed by an (optionally negated) expression, followed by 0 or more expressions, followed by ;. So far this is really general (it matches all ors with at least one operand).

            (?(?{
                index($1, $3) < 0 &&
                index($4, $3) < 0
            }) (*FAIL) )

This is an instance of the (?(COND)YES-PATTERN) regex syntax, which lets us match a regex conditionally. Our condition has the form (?{ ... }), which embeds arbitrary Perl code into the regex. If the condition is true, we match the regex (*FAIL), which simply fails right away. This is effectively a (negated) assertion: If the condition is false, nothing happens; if it is true, the match fails (and we backtrack).

The whole thing asserts that the subexpression $3 can be found somewhere in either $1 (the list of expressions before $3) or in $4 (the list of expressions after $3).

Taken together, we're looking for or [@x, y, @z] or or [@x, not y, @z] where y occurs in @x or @z, either directly or nested in a subexpression.

            $modlib
        }{
            my ($pre, $neg, $x, $post) = ($1, $2, $3, $4);

Having found such a match, we can now do a bit of simplification. The basic idea is that in or [@x, y, @z], all occurrences of y in @x and @z can be replaced by false. Why? Because y is either true or false. If it's true, or [@x, true, @z] is true no matter what we do to @x or @z. If it's false, replacing y by false is a no-op. Therefore replacing y by false in @x and @z doesn't change the outcome in either case.

Similar reasoning applies to or [@x, not y, @z], only we replace y by true in @x and @z (instead of false).

            my $spec = $neg ? '1' : '0';

If the original match was negated, the replacement is true, else false.

            s{ \Q$x\E }{$spec}g for $pre, $post;

Replace y in both @x and @z.

            "V$pre$neg$x$post;"

Reassemble the expression from the modified pieces.

        }ge
    ;

    say $_ eq '1' ? "Tautology" : "Not a tautology", "\n";
}

Finally, if the whole thing was reduced to 1 (i.e. true), say that it is tautological.

For a practical demonstration, here's the output from the sample input at SPOJ:

ORIG: IIpqDpNp
MOD: V~V~pq;Vp~p;;
...> V~V~pq;p~p;
...> V~V~0q;p~0;
...> V~V1q;p1;
...> 1
Tautology

ORIG: NCNpp
MOD: ~~V~~p~p;
...> Vp~p;
...> Vp~0;
...> Vp1;
...> 1
Tautology

ORIG: Iaz
MOD: V~az;
Not a tautology

ORIG: NNNNNNNp
MOD: ~~~~~~~p
...> ~p
Not a tautology

ORIG: IIqrIIpqIpr
MOD: V~V~qr;V~V~pq;V~pr;;;
...> V~V~qr;~V~pq;V~pr;;
...> V~V~qr;~V~pq;~pr;
...> V~V~qr;~V0q;~pr;
...> V~V~qr;~Vq;~pr;
...> V~V~qr;~q~pr;
...> V~V0r;~q~pr;
...> V~Vr;~q~pr;
...> V~r~q~pr;
...> V~r~q~p1;
...> 1
Tautology

ORIG: Ipp
MOD: V~pp;
...> V~p1;
...> 1
Tautology

ORIG: Ezz
MOD: V~V~z~z;~V~~z~~z;;
...> V~V~z~z;~Vzz;;
...> V~V~z0;~Vz0;;
...> V~V~z;~Vz;;
...> V~~z~z;
...> Vz~z;
...> Vz~0;
...> Vz1;
...> 1
Tautology

The complete program:

#!/usr/bin/env perl
use v5.14.0;
use warnings;
use re '/xms', $^V lt v5.18.0 ? 'eval' : ();

my $rawlib = qr{
    (?(DEFINE)
        (?<rawexpr>
            (?>
                [a-z]
            |
                N (?&rawexpr)
            |
                [CDIE] (?&rawexpr) (?&rawexpr)
            )
        )
    )
};

my $modlib = qr{
    (?(DEFINE)
        (?<exp>
            (?>
                [01a-z]
            |
                ~ (?&exp)
            |
                V (?&exp)*+ ;
            )
        )
    )
};

while (readline) {
    chomp;
    say "ORIG: $_";

    1 while s{ E ((?&rawexpr)) ((?&rawexpr)) $rawlib }{DC$1$2CN$1N$2}g;
    1 while s{ I ((?&rawexpr)) ((?&rawexpr)) $rawlib }{DN$1$2}g;
    1 while s{ C ((?&rawexpr)) ((?&rawexpr)) $rawlib }{NDN$1N$2}g;
    1 while s{ D ((?&rawexpr)) ((?&rawexpr)) $rawlib }{V$1$2;}g;
    tr/N/~/;

    say "MOD: $_";

    say "...> $_" while 0
        || s{ ~ ~ }{}g
        || s{ ~ 0 }{1}g
        || s{ ~ 1 }{0}g
        || s{ V ; }{0}g
        || s{ V ((?&exp)) ; $modlib }{$1}g
        || s{ V ((?&exp)*) V ((?&exp)*) ; $modlib }{V$1$2}g
        || s{ V (?&exp)* 1 (?&exp)*+ ; $modlib }{1}g
        || s{ V ((?&exp)*) 0 $modlib }{V$1}g
        || s{
            V ((?&exp)*?) (~??) ((?&exp)) ((?&exp)*+) ;
            (?(?{
                index($1, $3) < 0 &&
                index($4, $3) < 0
            }) (*FAIL) )
            $modlib
        }{
            my ($pre, $neg, $x, $post) = ($1, $2, $3, $4);
            my $spec = $neg ? '1' : '0';
            s{ \Q$x\E }{$spec}g for $pre, $post;
            "V$pre$neg$x$post;"
        }ge
    ;

    say $_ eq '1' ? "Tautology" : "Not a tautology", "\n";
}

__END__

Spam from waag.org

Today I got this spam message:

From: Tessel van Leeuwen <tessel@waag.org>
Subject: Invitation: Join the Apps for Europe competition and win 5,000 euro!
To: undisclosed-recipients:;

Hello,

I'm writing to you because I found your email address on the website, http://act.useperl.at/apw2014/wiki?node=HackaThon, and I would like to invite you to enter the Apps for Europe competition: ... <snip>

There are a couple of things about this I find interesting:

  1. As far as I can tell, my email address is not on the linked page.
  2. The message has no visible recipients (so I guess my address was listed in BCC).
  3. I've never done anything with apps.
  4. The Austrian Perl Workshop was 2 months ago. Why harvest the hackathon page for email addresses now?