a
^{100} against the pattern (a*
)^{n}b
. (Here exponentiation refers to string repetition, as in 'a' x 100
matched against ('a*' x $n) . 'b'
in Perl syntax.) What he found was that some implementations returned results instantly whereas others got extremely slow as soon as n
grew past 5, taking seconds, minutes and even hours to finish.
This problem is caused by excessive backtracking. It's possible to implement globbing without nested backtracking (and the linked article explains a simple algorithm to do so), but a naive recursive implementation will suffer from this issue. It affects some shells, FTP servers, and programming languages, including Perl: File::Glob
uses code from BSD libc (which is affected, unlike glibc). A patch was written and the next File::Glob
release will include a fixed algorithm.
But what really caught my eye was the way Python implements glob matching: It translates each glob pattern to a regex, then simply invokes the regex engine. This approach also suffers from exponential backtracking because Python's regex engine uses an exponentialtime algorithm. As it turns out I do something very similar in some of my JavaScript code to support userspecified wildcards.
The question is: Does converting the pattern to a regex suffer from the same problem in JavaScript and Perl, and can we avoid excessive backtracking somehow?
At first blush it seems like Perl is not affected:
$ perl e '("a" x 100) =~ /\A(?:a.*?){6}b\z/s'
returns instantly. But this is because a specific regex optimization kicks in: Perl first checks whether the fixed substring 'b'
appears in the target string, and because it doesn't, the match fails immediately. We can defeat this optimization by switching to:
$ perl e '("a" x 100) =~ /\A(?:a.*?){6}[bc]\z/s'
or
$ perl e '("a" x 100) =~ /\A(?:a.*?){6}b.*?a\z/s'
(Warning: This takes 1½ minutes to finish on my system; increasing the 6
to 7
or 8
is not recommended.)
So this approach seems vulnerable:
#!/usr/bin/env perl
use strict;
use warnings;
sub glob2re {
my ($pat) = @_;
$pat =~ s{(\W)}{
$1 eq '?' ? '.' :
$1 eq '*' ? '.*?' :
'\\' . $1
}eg;
return qr/\A$pat\z/s;
}
('a' x 100) =~ glob2re(('a*' x 7) . 'b*a');
The above code indeed takes a long time to finish.
What about JavaScript? Here's the equivalent code:
function glob2re(pat) {
pat = pat.replace(/\W/g, function (m0) {
return (
m0 === '?' ? '[\\s\\S]' :
m0 === '*' ? '[\\s\\S]*?' :
'\\' + m0
);
});
return new RegExp('^' + pat + '$');
}
glob2re('a*'.repeat(7) + 'b').test('a'.repeat(100));
(Note the extra suckiness: JavaScript has no /s
flag so you need something like [\s\S]
to match any character.)
I've only tried this in Firefox but it also takes a long time to finish.
Clearly the naive approach does not work well. Can we fix it?
The algorithm presented by Russ Cox works by limiting the stack of saved backtracking states to at most 1 entry (yeah, that's not much of a "stack" anymore). As soon as a new instance of *
starts being processed, all previous backtracking information is forgotten.
In Perl we can get the same effect by using the (*PRUNE)
control verb:
#!/usr/bin/env perl
use strict;
use warnings;
sub glob2re {
my ($pat) = @_;
$pat =~ s{(\W)}{
$1 eq '?' ? '.' :
$1 eq '*' ? '(*PRUNE).*?' :
'\\' . $1
}eg;
return qr/\A$pat\z/s;
}
('a' x 100) =~ glob2re(('a*' x 70) . 'b*a');
With this tiny change (adding (*PRUNE)
to the generated regex), even 70 wildcards in a single pattern pose no problem: The program finishes instantly.
Again, what about JavaScript? Here the situation is a bit more complicated because JavaScript doesn't support control verbs. Normally this wouldn't be much of a problem because we could just turn foo*bar*baz
into /foo(?>.*?bar)(?>.*?baz)/
instead, using (?>...)
(an independent subexpression) to limit backtracking. Unfortunately JavaScript doesn't support (?>...)
either. But wait! We can combine capturing groups, positive lookahead, and backreferences to simulate (?>foo)
: Just use (?=(foo))\1
instead. Well, what we really want is a backreference to the last thing we captured. Perl again makes this easy with a relative backreference (\g{1}
) but in JavaScript we're forced to use absolute numbering instead. We can still do it because we control the whole regex, we just have to do a bit more manual work:
function glob2re(pat) {
function tr(pat) {
return pat.replace(/\W/g, function (m0) {
return (
m0 === '?' ? '[\\s\\S]' :
'\\' + m0
);
});
}
var n = 1;
pat = pat.replace(/\W[^*]*/g, function (m0, mp, ms) {
if (m0.charAt(0) !== '*') {
return tr(m0);
}
var eos = mp + m0.length === ms.length ? '$' : '';
return '(?=([\\s\\S]*?' + tr(m0.substr(1)) + eos + '))\\' + n++;
});
return new RegExp('^' + pat + '$');
}
glob2re('a*'.repeat(70) + 'b').test('a'.repeat(100));
The code is starting to look a bit crazy and the regexes it generates are even worse, but it does work: Even 70 wildcards finish instantly.
Conclusion: Yes, converting glob patterns to efficient regexes is possible. It's even trivial in Perl. In JavaScript we have to jump through some annoying hoops but in the end we still get a regex that does what we want.
]]>@{[ ]}
are that it looks quite clunky, it's inefficient at runtime (it builds and dereferences an arrayref), it evaluates its contents in list context, and it gets really confusing if you need (nested) quotes in the interpolated part.]]>
Parameter lists / "signatures"
Instead of manually unpacking @_
, you can just write sub foo($bar, $baz) { ... }
to define a function with two arguments.
This feature is available in core perl5 since version 20 (the syntax changed slightly and it produces better error messages since version 22). It's still experimental in version 24 (and produces corresponding warnings when enabled).
However, the CPAN module Function::Parameters adds full support for parameter lists to every perl since 5.14 (albeit with a new keyword (fun
and method
) instead of sub
). It's available right now and not experimental:
use Function::Parameters qw(:strict);
fun foo($bar, $baz) {
...
}
Keyword arguments / named parameters
By defining your subroutine as sub foo(:$state, :$head, :$halt) {}
, you can call it as
foo(
head => 0,
state => 'A',
halt => 'Z',
);
or
foo(
halt => 'Z',
state => 'A',
head => 0,
);
or any argument order you like. You no longer have to remember the position of each argument, which is great, especially if your function takes more than 3 arguments (or you haven't touched the code in a month or three).
This is also available in Function::Parameters from perl 5.14 onwards:
use Function::Parameters qw(:strict);
fun foo(:$state, :$head, :$halt) {
}
Interpolating blocks in strings
Perl 5 lets you interpolate variables in doublequoted strings, which can be very convenient:
say "$greeting, visitor! Would you like some $beverage?";
However, this is limited to variables (scalars and arrays/array slices). There's no way to directly interpolate, say, method or function calls. That's why Perl 6 lets you interpolate arbitrary code in strings by using { blocks }
:
say "2 + 2 = {2 + 2}"; # "2 + 2 = 4"
This feature is available in Quote::Code on CPAN for all perls since 5.14:
use Quote::Code;
say qc"2 + 2 = {2 + 2}";
Funny Unicode variable names
One of the examples (Pollard ρfactorization) uses $ρ
(that's a Greek lowercase rho) as a variable name because Perl 6 supports Unicode in programs by default.
Perl 5 doesn't. By default, that is. But after a use utf8;
declaration, you can put arbitrary Unicode text in your string literals, regexes, etc. And it works for variables, too:
# this works on any perl version >= 5.8
use utf8;
my $ρ = 42;
print $ρ, "\n";
Of course there were many, many other things that are not so easily ported from Perl 6. But I think it's nice how much Just Works (or can be made to work with minimal effort) in existing Perl 5 code.
]]>> Many won't even concede that the behavior I called out was unpleasant in the first place.
This sounds like "Many people disagree with me, how dare they" to me. (I'm one of them.)
 "Condescending, abusive advice"
 "So come on in newbies. You're stupid. You fill us with contempt. Get ready to complain. And most of all, don't waste our precious time."
 "antique, pompous, and arrogant elitists who can't even treat each other with kindness and respect"
Isn't all of this condescending, abusive advice itself?
If you don't have anything nice to say, well:
> Silence is better than any response that isn't nice, honest, and helpful.
]]>Checking the arguments manually is annoying boilerplate code that no user wants to bother with. It's much better to be able to abstract it away and have the language do it for you automatically.
That said, Function::Parameters supports both: use Function::Parameters qw(:strict);
enables checks, but use Function::Parameters qw(:lax)
doesn't, just like my ($x, $y, $z) = @_;
.
EXPR if EXPR;
EXPR unless EXPR;
EXPR while EXPR;
EXPR until EXPR;
EXPR for EXPR;
Perl also has a Cstyle for
loop:
for (INIT; COND; STEP) {
...
}
The curious part: COND
is a normal expression, but STEP
allows statement modifiers. That is, you can write:
for (my $i = 0; $i < 10; $i++ if rand() < 0.5) {
print "$i\n";
}
]]>
You can always use an array reference, in curly braces, in place of the name of an array.
This leads to code like ${$foo}
(dereference a scalar reference) or @{$bar{baz}}
(dereference an array reference stored in a hash).
The curious part: The curly braces actually form a block, i.e. you can put multiple statements in there (just like do BLOCK
), as long as the last one returns a reference:
% perl E 'use strict; use warnings; ${say "hi"; \$_} = 42; say $_'
hi
42
This block also gets its own scope:
% perl E 'use strict; use warnings; ${my $x = "hi"; say $x; \$x} = 42; say $x'
Global symbol "$x" requires explicit package name at e line 1.
Execution of e aborted due to compilation errors.
$x
isn't visible outside the ${ ... }
block it was declared in.
% perl E 'use strict; use warnings; ${my $x = "hi"; say $x; \$x} = 42;'
hi
]]>
*
specially, no. But it supports many other features (wildcards don't match a leading dot, curlies, etc) plus it uses a rather Clike approach to converting the pattern (iterating over single characters, state machine) so it would take more than 10 seconds of looking at it to make any major changes to the algorithm.]]>
I don't think your version handles the last case with backslash escapes.
]]>[...]
here.]]>
/
isn't treated specially.
(In one place I use this for matching some HTTP header values, in which case .
is actually [!#$%&'*+\.^`~\w]
.)
?
matches any character and *
matches any string (0 or more characters). You want to convert it to a regex, maybe because you just want to match it (and Perl already supports regexes) or because you want to embed it as part of a bigger regex.
You might start with a naive replacement:
s/\?/./g; # ? > .
s/\*/.*/g; # * > .*
Unfortunately this is broken: It leaves all other characters untouched, including those that have a special meaning in regexes, such as (
, +
, 
, etc.
Let's revise it:
s{(\W)}{
$1 eq '?' ? '.' :
$1 eq '*' ? '.*' :
'\\' . $1
}eg;
Now we match and replace every nonword character. If it's ?
or *
, we turn it into its regex equivalent; otherwise we backslashescape it just like quotemeta
would do.
But what if the input is something like a***b
? This would turn into a.*.*.*b
, which when run on a long target string without b
s by a backtracking engine can be very inefficient (barring extra optimizations). A missing b
would make the match fail at the end, which would cause the engine to go through all possible ways .*.*.*
could subdivide the string amongst themselves before giving up. In general this takes O(n^{k}) time (where n is the length of the target string and k is the number of stars in the pattern).
We can do better than that by realizing **
is equivalent to *
, which means that any sequence of stars is equivalent to a single *
, and preprocessing the pattern:
tr,*,,s; # ***...* > *
This still doesn't fix everything, though: *?*?*
doesn't contain any repeated *
s but still allows for exponential backtracking. One way to work around this is to normalize the pattern even further: Because *?
is equivalent to ?*
, we can move all the ?
s to the front:
# "*?*?*"
1 while s/\*\?/?*/g;
# "?*?**" (after 1 iteration)
# "??***" (after 2 iterations)
tr,*,,s;
# "??*"
s{(\W)}{
$1 eq '?' ? '.' :
$1 eq '*' ? '.*' :
'\\' . $1
}eg;
# "...*"
However, I don't like that the transformation is spread out over two regex substitutions and one transliteration, when there is a way to do it all in a single substitution:
s{
( [?*]+ ) # a run of ? or * characters

(\W) # any other nonword character
}{
defined $1
? '.{' . ($1 =~ tr,?,,) . (index($1, '*') >= 0 ? ',' : '') . '}'
: '\\' . $2
}xeg;
That is, we turn each run of ?
or *
characters into .{N}
(if there was no *
) or .{N,}
(if there was at least one *
) where N
is the number of ?
s in the run.
Given an input of *?*?*
, this would generate .{2,}
("match 2 or more of any character").
And finally, if we wanted the user to be able to escape characters with a backslash to match them literally:
s{
( [?*]+ ) # a run of ? or * characters

\\ (.) # backslash escape

(\W) # any other nonword character
}{
defined $1
? '.{' . ($1 =~ tr,?,,) . (index($1, '*') >= 0 ? ',' : '') . '}'
: quotemeta $+
}xeg;
]]>
$ echo IEabEba  perl try.pl
ORIG: IEabEba
MOD: V~V~V~a~b;~V~~a~~b;;V~V~b~a;~V~~b~~a;;;
...> V~V~V~a~b;~Vab;;V~V~b~a;~Vba;;;
...> V~V~V~a~b;~Vab;;~V~b~a;~Vba;;
Not a tautology
(a equals b) implies (b equals a)
is a tautology but the program
fails to recognize it. With the fixed code below it generates this
output instead:
$ echo IEabEba  perl taut.pl
ORIG: IEabEba
MOD: V~V~V~a~b;~V~~a~~b;;V~V~b~a;~V~~b~~a;;;
...> V~V~V~a~b;~Vab;;V~V~b~a;~Vba;;;
...> V~V~V~a~b;~Vab;;~V~b~a;~Vba;;
...> V~Vba;~V~V~a~b;~Vab;;~V~b~a;;
...> V~Vab;~V~Vab;~V~a~b;;~V~a~b;;
...> V~Vab;~V0~V~a~b;;~V~a~b;;
...> V~Vab;~V~V~a~b;;~V~a~b;;
...> V~Vab;~~V~a~b;~V~a~b;;
...> V~Vab;V~a~b;~V~a~b;;
...> V~Vab;~a~b~V~a~b;;
...> V~Vab;~a~b~V0~b;;
...> V~Vab;~a~b~V~b;;
...> V~Vab;~a~b~~b;
...> V~Vab;~a~bb;
...> V~V1b;~a~bb;
...> V~1~a~bb;
...> V0~a~bb;
...> V~a~bb;
...> V~a~b1;
...> 1
Tautology
The following changes were made to the code:
The "regex libraries" $rawlib
and $modlib
and their named
subpatterns (?&rawexpr)
and (?&exp)
are gone. They were
replaced by $rawexpr
and $exp
, subregexes that directly match
and capture a simplifed and modified expression, respectively.
This change was made in order to make it possible to split a
string into subexpressions using my @expr = $str =~ /$exp/g
(i.e. m//g
in list context).
Two regexes were simplified by using \K
.
The old "final rule" did some duplicate work: It used
index($1, $3) < 0 && index($4, $3) < 0
inside the regex to
search for the subexpression $3
in both $1
and $4
. If
successful, it repeated the search in the replacement part:
s{ \Q$x\E }{$spec}g for $pre, $post;
The new version uses s///
directly in the regex and checks the
return value to see if any match was found/replaced. The
replacement string is assembled directly in the regex and
saved in an outer lexical variable to make it available in the
righthand side.
A new rule was added. If the other rules get stuck, it reorders
V
operands. The canonical order chosen is simply the default
sort
behavior: lexicographically ascending strings.
Due to change #3 and #4 the code no longer works on old perls (before v5.18) because in v5.18 the implementation of embedded code blocks in regexes was rewritten, fixing many bugs.
The new code:
#!/usr/bin/env perl
use v5.18.0;
use warnings;
use re '/xms';
my $rawexpr = qr{
(
(?>
[az]

N (?1)

[CDIE] (?1) (?1)
)
)
};
my $exp = qr{
(
(?>
[01az]

~ (?1)

V (?1)*+ ;
)
)
};
while (readline) {
chomp;
say "ORIG: $_";
1 while s{ E $rawexpr $rawexpr }{DC$1$2CN$1N$2}g;
1 while s{ I $rawexpr $rawexpr }{DN$1$2}g;
1 while s{ C $rawexpr $rawexpr }{NDN$1N$2}g;
1 while s{ D $rawexpr $rawexpr }{V$1$2;}g;
tr/N/~/;
say "MOD: $_";
say "...> $_" while 0
 s{ ~ ~ }{}g
 s{ ~ 0 }{1}g
 s{ ~ 1 }{0}g
 s{ V ; }{0}g
 s{ V $exp ; }{$1}g
 s{ V $exp* \K V ($exp*+) ; }{$2}g
 s{ V $exp* \K 0 }{}g
 s{ V $exp* 1 $exp*+ ; }{1}g
 do {
my $repl;
s{
V ($exp*?) (~??) $exp ($exp*+) ;
(?(?{
my ($pre, $neg, $x, $post) = ($1, $3, $4, $5);
my $spec = $neg ? '1' : '0';
my $n = 0;
$n += s{ \Q$x\E }{$spec}g for $pre, $post;
$repl = "V$pre$neg$x$post;";
!$n
}) (*FAIL) )
}{$repl}g
}
 do {
my $canon;
s{
V ($exp {2,}+) ;
(?(?{
my $orig = $1;
$canon = join '', sort $orig =~ m{ $exp }g;
$orig eq $canon
}) (*FAIL) )
}{V$canon;}g
}
;
say $_ eq '1' ? "Tautology" : "Not a tautology", "\n";
}
__END__
]]>
{# LANGUAGE LambdaCase #}
module Main (main) where
import Control.Applicative (Applicative (..), (<$>))
import Control.Arrow (first)
import Data.Ix (inRange)
import Control.Monad (guard, mplus, msum)
import Data.Maybe (fromMaybe, isJust, listToMaybe)
data RawExpr
= VarR Char
 N RawExpr
 D RawExpr RawExpr
 C RawExpr RawExpr
 I RawExpr RawExpr
 E RawExpr RawExpr
deriving (Eq, Ord, Read, Show)
newtype Parser a = Parser{ unParser :: String > (a, String) }
instance Functor Parser where
fmap f (Parser p) = Parser (first f . p)
instance Applicative Parser where
pure x = Parser (\s > (x, s))
Parser f <*> Parser x = Parser (\s >
let (f', s') = f s
(x', s'') = x s'
in (f' x', s''))
instance Monad Parser where
return = pure
Parser x >>= f = Parser (\s >
let (x', s') = x s
in unParser (f x') s')
anyChar :: Parser Char
anyChar = Parser (\(c : cs) > (c, cs))
parseRaw :: String > RawExpr
parseRaw s = case unParser expr s of
(x, "") > x
(_, rest) > error ("parseRaw: trailing garbage: " ++ show rest)
where
expr = anyChar >>= \case
c  inRange ('a', 'z') c > pure (VarR c)
'N' > N <$> expr
'D' > D <$> expr <*> expr
'C' > C <$> expr <*> expr
'I' > I <$> expr <*> expr
'E' > E <$> expr <*> expr
data ModExpr
= VarM Char
 T
 F
 Not ModExpr
 Or [ModExpr]
deriving (Eq, Ord, Read, Show)
desugar :: RawExpr > ModExpr
desugar = \case
VarR v > VarM v
N x > not (desugar x)
D x y > or (desugar x) (desugar y)
C x y > and (desugar x) (desugar y)
I x y > or (not (desugar x)) (desugar y)
E x y > or (and x' y') (and (not x') (not y'))
where
x' = desugar x
y' = desugar y
where
not = Not
or x y = Or [x, y]
and x y = not (or (not x) (not y))
render :: ModExpr > String
render e = go e "" where
go = \case
VarM c > (c :)
T > ('1' :)
F > ('0' :)
Not x > ('~' :) . go x
Or xs > ('V' :) . foldr (.) id (map go xs) . (';' :)
type Rule = ModExpr > Maybe ModExpr
mkRule :: (Maybe ModExpr > Maybe ModExpr) > Rule
mkRule t = rule
where
rule e = t (Just e) `mplus` rec
where
rec = case e of
Not x > Not <$> rule x
Or xs > do
let ys = map rule xs
guard (any isJust ys)
return (Or (zipWith fromMaybe xs ys))
_ > Nothing
not_not, not_true, not_false :: Rule
not_not = mkRule $ \e > do
Not (Not x) < e
pure x
not_true = mkRule $ \e > do
Not T < e
pure F
not_false = mkRule $ \e > do
Not F < e
pure T
or_true, or_false :: Rule
or_true = mkRule $ \e > do
Or xs < e
guard (T `elem` xs)
pure T
or_false = mkRule $ \e > do
Or xs < e
let ys = filter (F /=) xs
guard (length ys < length xs)
pure $ Or ys
first3 :: (a > r) > (a, b, c) > (r, b, c)
first3 f (x, y, z) = (f x, y, z)
second3 :: (b > r) > (a, b, c) > (a, r, c)
second3 f (x, y, z) = (x, f y, z)
selectSplit :: [a] > [([a], a, [a])]
selectSplit [] = []
selectSplit (x : xs) = ([], x, xs) : map (first3 (x :)) (selectSplit xs)
extract :: (a > Maybe b) > [a] > Maybe ([a], b, [a])
extract f xs = msum (map (evert . second3 f) (selectSplit xs))
where
evert (x, Just y, z) = Just (x, y, z)
evert _ = Nothing
or_none, or_one, or_assoc :: Rule
or_none = mkRule $ \e > do
Or [] < e
pure F
or_one = mkRule $ \e > do
Or [x] < e
pure x
or_assoc = mkRule $ \e > do
Or xs < e
(pre, ys, post) < extractOr xs
pure $ Or (pre ++ ys ++ post)
where
extractOr = extract (\case Or xs > Just xs; _ > Nothing)
isSubExpr :: ModExpr > ModExpr > Bool
isSubExpr e1 e2 =
e1 == e2  case e2 of
Not x > e1 `isSubExpr` x
Or xs > any (e1 `isSubExpr`) xs
_ > False
substitute :: ModExpr > ModExpr > ModExpr > ModExpr
substitute x y = go
where
go = \case
e  e == x > y
Not e > Not (go e)
Or es > Or (map go es)
e > e
or_spec :: Rule
or_spec = mkRule $ \e > do
Or xs < e
listToMaybe $ do
(pre, x_orig, post) < selectSplit xs
(Not x, r) < [(x_orig, T), (Not x_orig, F)]
guard $ any (x `isSubExpr`) (pre ++ post)
let z = Or (map (substitute x r) pre ++ [x_orig] ++ map (substitute x r) post)
pure z
rules :: [Rule]
rules =
[
not_not, not_true, not_false,
or_true, or_false,
or_none, or_one, or_assoc,
or_spec
]
simplify1 :: ModExpr > Maybe ModExpr
simplify1 e = msum (map ($ e) rules)
simplify :: ModExpr > (ModExpr, [String])
simplify e = case simplify1 e of
Nothing > (e, [])
Just e' > (render e' :) <$> simplify e'
process :: String > String
process s = unlines $
("ORIG: " ++ s) :
("MOD: " ++ render e) :
foldr (\x z > ("...> " ++ x) : z) [v, ""] ts
where
e = desugar (parseRaw s)
(r, ts) = simplify e
v  r == T = "Tautology"
 otherwise = "Not a tautology"
main :: IO ()
main = interact (unlines . map process . lines)
]]>
Addendum: I've written a more or less equivalent Haskell version of this program.
Some programmers, when confronted with a boolean expression, think "I know, I'll use regular expressions". Now they have
true
problems.
This post was inspired by the TAUT exercise on SPOJ.
You're given a boolean expression consisting of variables (each being either true
or false
), a unary operator (not
), and several binary operators (and
, or
, implies
, equals
). Your task is to determine whether the expression is a tautology, i.e. whether it evaluates to true
for all possible variable values.
To make parsing easier, we're going to use a very simplified syntax:
a
z
), i.e. an expression can use at most 26 different variablesN
for not
, C
(conjunction) for and
, D
(disjunction) for or
, I
for implies
, E
for equals
x and not y
is written as CxNy
, etc.For example, the expression ((a implies b) and (b implies c)) implies (a implies c)
becomes ICIabIbcIac
.
There is a number of ways to solve this problem, e.g. using brute force to enumerate all possible variable assignments and test the formula, but we're going to use algebraic simplification to determine directly whether a given expression reduces to true
. To make it more fun, we're going to implement it as a string rewriting system using Perl's regex engine.
In the following sections I'm going to take you through the code sections in an order that makes it easier to explain. The full runnable code is at the end.
#!/usr/bin/env perl
use v5.14.0;
use warnings;
use re '/xms', $^V lt v5.18.0 ? 'eval' : ();
The header: The #!
line instructs the system to use whatever perl is in the path. use v5.14.0
requires at least version 14 of perl5 (any earlier version will abort parsing with an error at this point). This enables a few features new in version 14 and also turns on strict
(since v5.12). use warnings
enables warnings  almost always a good idea. use re '/xms'
enables the xms
flags by default for all regexes in the current scope (this feature was introduced in v5.14). Finally, for perls before v5.18, we use re 'eval'
to work around a bug related to parsing of (?{ })
blocks in regexes.
while (readline) {
chomp;
say "ORIG: $_";
We iterate over the lines in ARGV
(formed by the contents of the files listed in @ARGV
or STDIN
if @ARGV
is empty). From each line (stored in $_
) we remove the trailing newline and print it to STDOUT
so we can see what we're working on.
Our next goal is to reduce the number of possible operators in the expression. We do that by rewriting everything in terms of N
(not
) and D
(or
).
The first step is getting rid of E
: What does it mean for two expressions x, y to be equal? Well, either they're both true
or they're both false
, so the first part is simply x and y
. If they're both false
, their negations must be true
, giving us not x and not y
. Putting it all together, X equals Y
can be rewritten as (X and Y) or (not X and not Y)
, which in the simplified syntax is Exy
and DCxyCNxNy
, respectively.
1 while s{ E ((?&rawexpr)) ((?&rawexpr)) $rawlib }{DC$1$2CN$1N$2}g;
I'm going to explain the (?&rawexpr)
and $rawlib
bits in a moment. For now, it suffices to say that (?&rawexpr)
is a regex that matches an expression in the simplified syntax and $rawlib
contains the definition of (?&rawexpr)
.
We have a few spaces in our regex. These are just for readability and ignored because of the /x
flag, which we enabled for all regexes in our main file.
We match E
followed by two subexpressions, which we capture as $1
and $2
. We replace it by DC$1$2CN$1N$2
, which (as explained above) is equivalent. The /g
flag makes s///
replace all occurrences, so what is the outer while
loop for? It's for nested expressions of the form ExEyz
: x
would be matched as $1
and Eyz
as $2
, but we're not recursively replacing E
inside of $1
or $2
, so at the end we still have an instance of E
left. Instead of trying to write some kind of nested recursive substitution, we just wrap the whole thing in a loop that tries again if any match/replacement happened in the previous iteration, until all the E
s are gone.
1 while s{ I ((?&rawexpr)) ((?&rawexpr)) $rawlib }{DN$1$2}g;
Next we get rid of I
. x implies y
is only false
if x
is true
and y
is false
; all other cases are true
. This corresponds to not x or y
, which is DNxy
in the simplified syntax.
1 while s{ C ((?&rawexpr)) ((?&rawexpr)) $rawlib }{NDN$1N$2}g;
Now we get rid of C
using De Morgan's law: not (x and y) = not x or not y
. Adding a not
on both sides gives us not (not (x and y)) = not (not x or not y)
. The two not
s on the left side cancel out, giving x and y = not (not x or not y)
, which is NDNxNy
in the simplified syntax.
Now for the definition of $rawlib
, which makes use of regex features introduced in v5.10:
my $rawlib = qr{
(?(DEFINE)
(?<rawexpr>
(?>
[az]

N (?&rawexpr)

[CDIE] (?&rawexpr) (?&rawexpr)
)
)
)
};
This says the whole regex is in a (?(DEFINE) ... )
block, which has no effect on matching. It merely provides definitions for use in the rest of the regex (i.e. $rawlib
is effectively a regex library). Using (?<rawexpr> ... )
we define a named subpattern called rawexpr
: A rawexpr
is either a character from the range az
, or N followed by a rawexpr
, or one of CDIE
followed by two rawexpr
s. The (?&rawexpr)
syntax lets us invoke named subpatterns recursively.
By including $rawlib
in the regexes above we made the name rawexpr
visible in the surrounding regex, which allowed us to simply write (?&rawexpr)
to match a nontrivial piece of the input.
At this point in the program we've reduced the input expression to nothing but combinations of variables, N
, and D
. To proceed further, we're going to switch from a 2operand or
(D
) to an operator with a variable number of operands (V...;
), i.e. we're going to turn Dxy
into Vxy;
:
1 while s{ D ((?&rawexpr)) ((?&rawexpr)) $rawlib }{V$1$2;}g;
This line works because even though (?&rawexpr)
doesn't match V...;
, regexes always find the leftmost match first, and because we're using prefix notation, the first match will be the outermost expression. This means we're rewriting from the outside in, so any subexpressions are guaranteed to not contain V...;
yet.
tr/N/~/;
Finally (because I think it looks better) we turn all the N
s into ~
s.
say "MOD: $_";
At this point we've completely changed from the raw input form to something using variables, ~
, and V...;
. To confirm everything went well, we output the modified expression.
What follows are the rewrite rules that simplify the expression. We start with another regex library:
my $modlib = qr{
(?(DEFINE)
(?<exp>
(?>
[01az]

~ (?&exp)

V (?&exp)*+ ;
)
)
)
};
A (modified) expression exp
is either 0
or 1
(constants representing false
and true
, respectively), or a variable from the set az
, or ~
followed by an exp
, or V
followed by 0 or more exp
s, followed by ;
.
Now the actual rewrite code:
say "...> $_" while 0
We print the current expression after each substitution step to see what's going on. The 0
after while
lets us start the next line with 
, which makes the substitutions look uniform. We repeatedly try to find matches of our rules until none of the regexes matches anymore. At that point we've finished rewriting (and hopefully simplifying) the expression as much as possible.
 s{ ~ ~ }{}g
All instances of ~~
can be removed (not not
cancels out).
 s{ ~ 0 }{1}g
not false
is true
.
 s{ ~ 1 }{0}g
not true
is false
.
 s{ V ; }{0}g
or []
with no operands is false
(because false
is the neutral element for or
).
 s{ V ((?&exp)) ; $modlib }{$1}g
or [x]
with a single operand is just that operand (x
).
 s{ V ((?&exp)*) V ((?&exp)*) ; $modlib }{V$1$2}g
or [@x, or [@y], @z]
is the same as or [@x, @y, @z]
(using @x
to stand for "a list of expressions called x"). This is essentially an associativity rule that eliminates nested or
s.
 s{ V (?&exp)* 1 (?&exp)*+ ; $modlib }{1}g
or [@x, true, @y]
is true
; i.e. an or
with true
among its operands is itself true
.
 s{ V ((?&exp)*) 0 $modlib }{V$1}g
or [@x, false, @y]
is or [@x, @y]
; i.e. false
can be eliminated if it's an operand of or
.
The final rule is quite a mouthful so I'm going to provide inline commentary:
 s{
V ((?&exp)*?) (~??) ((?&exp)) ((?&exp)*+) ;
We match V
, followed by 0 or more expressions, followed by an (optionally negated) expression, followed by 0 or more expressions, followed by ;
. So far this is really general (it matches all or
s with at least one operand).
(?(?{
index($1, $3) < 0 &&
index($4, $3) < 0
}) (*FAIL) )
This is an instance of the (?(COND)YESPATTERN)
regex syntax, which lets us match a regex conditionally. Our condition has the form (?{ ... })
, which embeds arbitrary Perl code into the regex. If the condition is true, we match the regex (*FAIL)
, which simply fails right away. This is effectively a (negated) assertion: If the condition is false, nothing happens; if it is true, the match fails (and we backtrack).
The whole thing asserts that the subexpression $3
can be found somewhere in either $1
(the list of expressions before $3
) or in $4
(the list of expressions after $3
).
Taken together, we're looking for or [@x, y, @z]
or or [@x, not y, @z]
where y
occurs in @x
or @z
, either directly or nested in a subexpression.
$modlib
}{
my ($pre, $neg, $x, $post) = ($1, $2, $3, $4);
Having found such a match, we can now do a bit of simplification. The basic idea is that in or [@x, y, @z]
, all occurrences of y
in @x
and @z
can be replaced by false
. Why? Because y
is either true
or false
. If it's true
, or [@x, true, @z]
is true
no matter what we do to @x
or @z
. If it's false
, replacing y
by false
is a noop. Therefore replacing y
by false
in @x
and @z
doesn't change the outcome in either case.
Similar reasoning applies to or [@x, not y, @z]
, only we replace y
by true
in @x
and @z
(instead of false
).
my $spec = $neg ? '1' : '0';
If the original match was negated, the replacement is true
, else false
.
s{ \Q$x\E }{$spec}g for $pre, $post;
Replace y
in both @x
and @z
.
"V$pre$neg$x$post;"
Reassemble the expression from the modified pieces.
}ge
;
say $_ eq '1' ? "Tautology" : "Not a tautology", "\n";
}
Finally, if the whole thing was reduced to 1
(i.e. true
), say that it is tautological.
For a practical demonstration, here's the output from the sample input at SPOJ:
ORIG: IIpqDpNp
MOD: V~V~pq;Vp~p;;
...> V~V~pq;p~p;
...> V~V~0q;p~0;
...> V~V1q;p1;
...> 1
Tautology
ORIG: NCNpp
MOD: ~~V~~p~p;
...> Vp~p;
...> Vp~0;
...> Vp1;
...> 1
Tautology
ORIG: Iaz
MOD: V~az;
Not a tautology
ORIG: NNNNNNNp
MOD: ~~~~~~~p
...> ~p
Not a tautology
ORIG: IIqrIIpqIpr
MOD: V~V~qr;V~V~pq;V~pr;;;
...> V~V~qr;~V~pq;V~pr;;
...> V~V~qr;~V~pq;~pr;
...> V~V~qr;~V0q;~pr;
...> V~V~qr;~Vq;~pr;
...> V~V~qr;~q~pr;
...> V~V0r;~q~pr;
...> V~Vr;~q~pr;
...> V~r~q~pr;
...> V~r~q~p1;
...> 1
Tautology
ORIG: Ipp
MOD: V~pp;
...> V~p1;
...> 1
Tautology
ORIG: Ezz
MOD: V~V~z~z;~V~~z~~z;;
...> V~V~z~z;~Vzz;;
...> V~V~z0;~Vz0;;
...> V~V~z;~Vz;;
...> V~~z~z;
...> Vz~z;
...> Vz~0;
...> Vz1;
...> 1
Tautology
The complete program:
#!/usr/bin/env perl
use v5.14.0;
use warnings;
use re '/xms', $^V lt v5.18.0 ? 'eval' : ();
my $rawlib = qr{
(?(DEFINE)
(?<rawexpr>
(?>
[az]

N (?&rawexpr)

[CDIE] (?&rawexpr) (?&rawexpr)
)
)
)
};
my $modlib = qr{
(?(DEFINE)
(?<exp>
(?>
[01az]

~ (?&exp)

V (?&exp)*+ ;
)
)
)
};
while (readline) {
chomp;
say "ORIG: $_";
1 while s{ E ((?&rawexpr)) ((?&rawexpr)) $rawlib }{DC$1$2CN$1N$2}g;
1 while s{ I ((?&rawexpr)) ((?&rawexpr)) $rawlib }{DN$1$2}g;
1 while s{ C ((?&rawexpr)) ((?&rawexpr)) $rawlib }{NDN$1N$2}g;
1 while s{ D ((?&rawexpr)) ((?&rawexpr)) $rawlib }{V$1$2;}g;
tr/N/~/;
say "MOD: $_";
say "...> $_" while 0
 s{ ~ ~ }{}g
 s{ ~ 0 }{1}g
 s{ ~ 1 }{0}g
 s{ V ; }{0}g
 s{ V ((?&exp)) ; $modlib }{$1}g
 s{ V ((?&exp)*) V ((?&exp)*) ; $modlib }{V$1$2}g
 s{ V (?&exp)* 1 (?&exp)*+ ; $modlib }{1}g
 s{ V ((?&exp)*) 0 $modlib }{V$1}g
 s{
V ((?&exp)*?) (~??) ((?&exp)) ((?&exp)*+) ;
(?(?{
index($1, $3) < 0 &&
index($4, $3) < 0
}) (*FAIL) )
$modlib
}{
my ($pre, $neg, $x, $post) = ($1, $2, $3, $4);
my $spec = $neg ? '1' : '0';
s{ \Q$x\E }{$spec}g for $pre, $post;
"V$pre$neg$x$post;"
}ge
;
say $_ eq '1' ? "Tautology" : "Not a tautology", "\n";
}
__END__
]]>