## Fun with logical expressions

Addendum 2: This set of rules is incomplete. See Fun with logical expressions 2 for an improved version.

Addendum: I've written a more or less equivalent Haskell version of this program.

Some programmers, when confronted with a boolean expression, think "I know, I'll use regular expressions". Now they have `true` problems.

This post was inspired by the TAUT exercise on SPOJ.

You're given a boolean expression consisting of variables (each being either `true` or `false`), a unary operator (`not`), and several binary operators (`and`, `or`, `implies`, `equals`). Your task is to determine whether the expression is a tautology, i.e. whether it evaluates to `true` for all possible variable values.

To make parsing easier, we're going to use a very simplified syntax:

• Variables are represented by lowercase Latin letters (`a`-`z`), i.e. an expression can use at most 26 different variables
• Operators are represented by uppercase letters: `N` for `not`, `C` (conjunction) for `and`, `D` (disjunction) for `or`, `I` for `implies`, `E` for `equals`
• Operators precede their operands: `x and not y` is written as `CxNy`, etc.

For example, the expression `((a implies b) and (b implies c)) implies (a implies c)` becomes `ICIabIbcIac`.

There is a number of ways to solve this problem, e.g. using brute force to enumerate all possible variable assignments and test the formula, but we're going to use algebraic simplification to determine directly whether a given expression reduces to `true`. To make it more fun, we're going to implement it as a string rewriting system using Perl's regex engine.

In the following sections I'm going to take you through the code sections in an order that makes it easier to explain. The full runnable code is at the end.

``````#!/usr/bin/env perl
use v5.14.0;
use warnings;
use re '/xms', \$^V lt v5.18.0 ? 'eval' : ();
``````

The header: The `#!` line instructs the system to use whatever perl is in the path. `use v5.14.0` requires at least version 14 of perl5 (any earlier version will abort parsing with an error at this point). This enables a few features new in version 14 and also turns on `strict` (since v5.12). `use warnings` enables warnings - almost always a good idea. `use re '/xms'` enables the `xms` flags by default for all regexes in the current scope (this feature was introduced in v5.14). Finally, for perls before v5.18, we `use re 'eval'` to work around a bug related to parsing of `(?{ })` blocks in regexes.

``````while (readline) {
chomp;
say "ORIG: \$_";
``````

We iterate over the lines in `ARGV` (formed by the contents of the files listed in `@ARGV` or `STDIN` if `@ARGV` is empty). From each line (stored in `\$_`) we remove the trailing newline and print it to `STDOUT` so we can see what we're working on.

Our next goal is to reduce the number of possible operators in the expression. We do that by rewriting everything in terms of `N` (`not`) and `D` (`or`).

The first step is getting rid of `E`: What does it mean for two expressions x, y to be equal? Well, either they're both `true` or they're both `false`, so the first part is simply `x and y`. If they're both `false`, their negations must be `true`, giving us `not x and not y`. Putting it all together, `X equals Y` can be rewritten as `(X and Y) or (not X and not Y)`, which in the simplified syntax is `Exy` and `DCxyCNxNy`, respectively.

``````    1 while s{ E ((?&rawexpr)) ((?&rawexpr)) \$rawlib }{DC\$1\$2CN\$1N\$2}g;
``````

I'm going to explain the `(?&rawexpr)` and `\$rawlib` bits in a moment. For now, it suffices to say that `(?&rawexpr)` is a regex that matches an expression in the simplified syntax and `\$rawlib` contains the definition of `(?&rawexpr)`.

We have a few spaces in our regex. These are just for readability and ignored because of the `/x` flag, which we enabled for all regexes in our main file.

We match `E` followed by two subexpressions, which we capture as `\$1` and `\$2`. We replace it by `DC\$1\$2CN\$1N\$2`, which (as explained above) is equivalent. The `/g` flag makes `s///` replace all occurrences, so what is the outer `while` loop for? It's for nested expressions of the form `ExEyz`: `x` would be matched as `\$1` and `Eyz` as `\$2`, but we're not recursively replacing `E` inside of `\$1` or `\$2`, so at the end we still have an instance of `E` left. Instead of trying to write some kind of nested recursive substitution, we just wrap the whole thing in a loop that tries again if any match/replacement happened in the previous iteration, until all the `E`s are gone.

``````    1 while s{ I ((?&rawexpr)) ((?&rawexpr)) \$rawlib }{DN\$1\$2}g;
``````

Next we get rid of `I`. `x implies y` is only `false` if `x` is `true` and `y` is `false`; all other cases are `true`. This corresponds to `not x or y`, which is `DNxy` in the simplified syntax.

``````    1 while s{ C ((?&rawexpr)) ((?&rawexpr)) \$rawlib }{NDN\$1N\$2}g;
``````

Now we get rid of `C` using De Morgan's law: `not (x and y) = not x or not y`. Adding a `not` on both sides gives us `not (not (x and y)) = not (not x or not y)`. The two `not`s on the left side cancel out, giving `x and y = not (not x or not y)`, which is `NDNxNy` in the simplified syntax.

Now for the definition of `\$rawlib`, which makes use of regex features introduced in v5.10:

``````my \$rawlib = qr{
(?(DEFINE)
(?<rawexpr>
(?>
[a-z]
|
N (?&rawexpr)
|
[CDIE] (?&rawexpr) (?&rawexpr)
)
)
)
};
``````

This says the whole regex is in a `(?(DEFINE) ... )` block, which has no effect on matching. It merely provides definitions for use in the rest of the regex (i.e. `\$rawlib` is effectively a regex library). Using `(?<rawexpr> ... )` we define a named subpattern called `rawexpr`: A `rawexpr` is either a character from the range `a-z`, or N followed by a `rawexpr`, or one of `CDIE` followed by two `rawexpr`s. The `(?&rawexpr)` syntax lets us invoke named subpatterns recursively.

By including `\$rawlib` in the regexes above we made the name `rawexpr` visible in the surrounding regex, which allowed us to simply write `(?&rawexpr)` to match a non-trivial piece of the input.

At this point in the program we've reduced the input expression to nothing but combinations of variables, `N`, and `D`. To proceed further, we're going to switch from a 2-operand `or` (`D`) to an operator with a variable number of operands (`V...;`), i.e. we're going to turn `Dxy` into `Vxy;`:

``````    1 while s{ D ((?&rawexpr)) ((?&rawexpr)) \$rawlib }{V\$1\$2;}g;
``````

This line works because even though `(?&rawexpr)` doesn't match `V...;`, regexes always find the leftmost match first, and because we're using prefix notation, the first match will be the outermost expression. This means we're rewriting from the outside in, so any subexpressions are guaranteed to not contain `V...;` yet.

``````    tr/N/~/;
``````

Finally (because I think it looks better) we turn all the `N`s into `~`s.

``````    say "MOD: \$_";
``````

At this point we've completely changed from the raw input form to something using variables, `~`, and `V...;`. To confirm everything went well, we output the modified expression.

What follows are the rewrite rules that simplify the expression. We start with another regex library:

``````my \$modlib = qr{
(?(DEFINE)
(?<exp>
(?>
[01a-z]
|
~ (?&exp)
|
V (?&exp)*+ ;
)
)
)
};
``````

A (modified) expression `exp` is either `0` or `1` (constants representing `false` and `true`, respectively), or a variable from the set `a-z`, or `~` followed by an `exp`, or `V` followed by 0 or more `exp`s, followed by `;`.

Now the actual rewrite code:

``````    say "...> \$_" while 0
``````

We print the current expression after each substitution step to see what's going on. The `0` after `while` lets us start the next line with `||`, which makes the substitutions look uniform. We repeatedly try to find matches of our rules until none of the regexes matches anymore. At that point we've finished rewriting (and hopefully simplifying) the expression as much as possible.

``````        || s{ ~ ~ }{}g
``````

All instances of `~~` can be removed (`not not` cancels out).

``````        || s{ ~ 0 }{1}g
``````

`not false` is `true`.

``````        || s{ ~ 1 }{0}g
``````

`not true` is `false`.

``````        || s{ V ; }{0}g
``````

`or []` with no operands is `false` (because `false` is the neutral element for `or`).

``````        || s{ V ((?&exp)) ; \$modlib }{\$1}g
``````

`or [x]` with a single operand is just that operand (`x`).

``````        || s{ V ((?&exp)*) V ((?&exp)*) ; \$modlib }{V\$1\$2}g
``````

`or [@x, or [@y], @z]` is the same as `or [@x, @y, @z]` (using `@x` to stand for "a list of expressions called x"). This is essentially an associativity rule that eliminates nested `or`s.

``````        || s{ V (?&exp)* 1 (?&exp)*+ ; \$modlib }{1}g
``````

`or [@x, true, @y]` is `true`; i.e. an `or` with `true` among its operands is itself `true`.

``````        || s{ V ((?&exp)*) 0 \$modlib }{V\$1}g
``````

`or [@x, false, @y]` is `or [@x, @y]`; i.e. `false` can be eliminated if it's an operand of `or`.

The final rule is quite a mouthful so I'm going to provide inline commentary:

``````        || s{
V ((?&exp)*?) (~??) ((?&exp)) ((?&exp)*+) ;
``````

We match `V`, followed by 0 or more expressions, followed by an (optionally negated) expression, followed by 0 or more expressions, followed by `;`. So far this is really general (it matches all `or`s with at least one operand).

``````            (?(?{
index(\$1, \$3) < 0 &&
index(\$4, \$3) < 0
}) (*FAIL) )
``````

This is an instance of the `(?(COND)YES-PATTERN)` regex syntax, which lets us match a regex conditionally. Our condition has the form `(?{ ... })`, which embeds arbitrary Perl code into the regex. If the condition is true, we match the regex `(*FAIL)`, which simply fails right away. This is effectively a (negated) assertion: If the condition is false, nothing happens; if it is true, the match fails (and we backtrack).

The whole thing asserts that the subexpression `\$3` can be found somewhere in either `\$1` (the list of expressions before `\$3`) or in `\$4` (the list of expressions after `\$3`).

Taken together, we're looking for `or [@x, y, @z]` or `or [@x, not y, @z]` where `y` occurs in `@x` or `@z`, either directly or nested in a subexpression.

``````            \$modlib
}{
my (\$pre, \$neg, \$x, \$post) = (\$1, \$2, \$3, \$4);
``````

Having found such a match, we can now do a bit of simplification. The basic idea is that in `or [@x, y, @z]`, all occurrences of `y` in `@x` and `@z` can be replaced by `false`. Why? Because `y` is either `true` or `false`. If it's `true`, `or [@x, true, @z]` is `true` no matter what we do to `@x` or `@z`. If it's `false`, replacing `y` by `false` is a no-op. Therefore replacing `y` by `false` in `@x` and `@z` doesn't change the outcome in either case.

Similar reasoning applies to `or [@x, not y, @z]`, only we replace `y` by `true` in `@x` and `@z` (instead of `false`).

``````            my \$spec = \$neg ? '1' : '0';
``````

If the original match was negated, the replacement is `true`, else `false`.

``````            s{ \Q\$x\E }{\$spec}g for \$pre, \$post;
``````

Replace `y` in both `@x` and `@z`.

``````            "V\$pre\$neg\$x\$post;"
``````

Reassemble the expression from the modified pieces.

``````        }ge
;

say \$_ eq '1' ? "Tautology" : "Not a tautology", "\n";
}
``````

Finally, if the whole thing was reduced to `1` (i.e. `true`), say that it is tautological.

For a practical demonstration, here's the output from the sample input at SPOJ:

``````ORIG: IIpqDpNp
MOD: V~V~pq;Vp~p;;
...> V~V~pq;p~p;
...> V~V~0q;p~0;
...> V~V1q;p1;
...> 1
Tautology

ORIG: NCNpp
MOD: ~~V~~p~p;
...> Vp~p;
...> Vp~0;
...> Vp1;
...> 1
Tautology

ORIG: Iaz
MOD: V~az;
Not a tautology

ORIG: NNNNNNNp
MOD: ~~~~~~~p
...> ~p
Not a tautology

ORIG: IIqrIIpqIpr
MOD: V~V~qr;V~V~pq;V~pr;;;
...> V~V~qr;~V~pq;V~pr;;
...> V~V~qr;~V~pq;~pr;
...> V~V~qr;~V0q;~pr;
...> V~V~qr;~Vq;~pr;
...> V~V~qr;~q~pr;
...> V~V0r;~q~pr;
...> V~Vr;~q~pr;
...> V~r~q~pr;
...> V~r~q~p1;
...> 1
Tautology

ORIG: Ipp
MOD: V~pp;
...> V~p1;
...> 1
Tautology

ORIG: Ezz
MOD: V~V~z~z;~V~~z~~z;;
...> V~V~z~z;~Vzz;;
...> V~V~z0;~Vz0;;
...> V~V~z;~Vz;;
...> V~~z~z;
...> Vz~z;
...> Vz~0;
...> Vz1;
...> 1
Tautology
``````

The complete program:

``````#!/usr/bin/env perl
use v5.14.0;
use warnings;
use re '/xms', \$^V lt v5.18.0 ? 'eval' : ();

my \$rawlib = qr{
(?(DEFINE)
(?<rawexpr>
(?>
[a-z]
|
N (?&rawexpr)
|
[CDIE] (?&rawexpr) (?&rawexpr)
)
)
)
};

my \$modlib = qr{
(?(DEFINE)
(?<exp>
(?>
[01a-z]
|
~ (?&exp)
|
V (?&exp)*+ ;
)
)
)
};

chomp;
say "ORIG: \$_";

1 while s{ E ((?&rawexpr)) ((?&rawexpr)) \$rawlib }{DC\$1\$2CN\$1N\$2}g;
1 while s{ I ((?&rawexpr)) ((?&rawexpr)) \$rawlib }{DN\$1\$2}g;
1 while s{ C ((?&rawexpr)) ((?&rawexpr)) \$rawlib }{NDN\$1N\$2}g;
1 while s{ D ((?&rawexpr)) ((?&rawexpr)) \$rawlib }{V\$1\$2;}g;
tr/N/~/;

say "MOD: \$_";

say "...> \$_" while 0
|| s{ ~ ~ }{}g
|| s{ ~ 0 }{1}g
|| s{ ~ 1 }{0}g
|| s{ V ; }{0}g
|| s{ V ((?&exp)) ; \$modlib }{\$1}g
|| s{ V ((?&exp)*) V ((?&exp)*) ; \$modlib }{V\$1\$2}g
|| s{ V (?&exp)* 1 (?&exp)*+ ; \$modlib }{1}g
|| s{ V ((?&exp)*) 0 \$modlib }{V\$1}g
|| s{
V ((?&exp)*?) (~??) ((?&exp)) ((?&exp)*+) ;
(?(?{
index(\$1, \$3) < 0 &&
index(\$4, \$3) < 0
}) (*FAIL) )
\$modlib
}{
my (\$pre, \$neg, \$x, \$post) = (\$1, \$2, \$3, \$4);
my \$spec = \$neg ? '1' : '0';
s{ \Q\$x\E }{\$spec}g for \$pre, \$post;
"V\$pre\$neg\$x\$post;"
}ge
;

say \$_ eq '1' ? "Tautology" : "Not a tautology", "\n";
}

__END__
``````

Fun.