Playing games with outthentic dsl
Outthentic - is language to parse unstructured text.
It was grown as supporter for web application test tool named swat.
Web application is where text often comes in unstructured and unordered way, even though there is json and
xml, there are a lot of applications when it is not the case.
Then a generic test tool named outthentic was created
as solution for any text parsing/testing tasks. This tool is based on outthentic dsl as well.
Creation a new consumers of outthentic language is way too easy, with API exposed and explained at
outthentic documentation.
What I try to do in this short post is to highlight some randomly picked features to let readers to have a sense what is
outthentic way to analyze and verify text output, which of course could be used wide in daily testing tasks.
If ( after reading this post ) you feel like to know more - an official outthentic documentation is here
and ( less formal ) - here
Ranges
Sometimes you are given with some repetitive lines bounded by some conditions .
A classic thing is tables.
Imagine a table with two columns of ABC letters and a position numbers:
Letter Number
A 1
B 2
C 3
...
Z 26
End of table
Let's write up a dsl code to verify that:
- we have table 26 rows with 2 cell in each one
- a first cell of every row is ABC letter and second one is a number.
First let's verify a basic structure:
between: Letter\s+Number End\s+of\s+table
regexp: ([A-Z]+)\s+(\d+)
end:
Having this we asked outthentic dsl parser to check that we have Letters and numbers inside range bounded by
table header and table footer. Quite easy so far.
Then let's count a table rows.
To do this we need to add some imperative constructions to this quite declarative code:
between: Letter\s+Number End\s+of\s+table
regexp: ([A-Z]+)\s+(\d+)
code: our $total_rows++ for @{match_lines()};
validator: [ our $total_rows == 26, 'valid rows number ']
end:
Comments here:
Define perl code being executed during parsing process
Returns array of successfully matched lines
Define perl code being executed, once code is executed a return value is passed as as arguments to
Test::More::ok
function:
# $r - is array reference returned after execution of :validator code
ok($r->[0],$r->[1])
Fine control with captures
Captures function let you gain more fine control over data being checked. It returns
all the chunks get captured over latest regular expression check.
between: Letter\s+Number End\s+of\s+table
regexp: ([A-Z]+)\s+(\d+)
code: \
for my $c (@{captures()}){ \
print $c->[0],'/',$c->[1], "\n"; \
}
end:
The code above will print:
A/1
B/2
C/3
...
Sequences and generators
Continuous lines sequences are often a subject of testing when dealing with unstructured text.
Let's rewrite latest code example using text block expressions:
begin:
A 1
B 2
C 3
# and so on till Z 26
end:
This simple code snippet is example of continuous sequence check, when you need to verify that one line followed by another and so on.
Quite easy, but we need to hardcode all 26 rows, which is not good. Let's rewrite this simple test again using generator expression:
begin:
generator: [ my $i, map { $i++; "$_ $i" } A .. Z ]
end:
Generators like code:
or validator:
expressions are just piece of perl code being executed.
A return value of generator code ( should be array reference ) defines new outthentic entities get parsed by outthentic parser.
There is no limit, as generator could create:
Streams
And finally new killer feature of outthentic dsl called streams.
Stream() function like match_lines() function return lines successfully matched during verification process.
But streams add some improvements against match_lines, they are able to:
Let's see a trivial text output need to verify:
<letters>
A
B
C
</letters>
<letters>
D
E
F
G
</letters>
<letters>
H
I
</letters>
Writing a dsl code:
between <letters> <\/letters>
regexp: [A-Z]
So good so far. Let's add some debugging lines:
between <letters> <\/letters>
regexp: [A-Z]
code: for my $l (@{match_lines}) { print "$l "\n" }
Get this: ( which is obvious )
A
B
C
D
E
F
G
H
I
What we could see? We lost group context, all the letters now are seen at one heap, without knowledge about original groups.
Now with stream() function:
between <letters> <\/letters>
regexp: [A-Z]
code: \
my $i; \
for my $s (@{stream()}) { \
$i++; \
print "stream #$i\n"; \
for my $l (@{$s}){ \
print "$l\n" \
} \
}
Output:
stream # 1
A
B
C
stream # 2
D
E
F
G
stream # 3
H
I
-- Regards
Alexey Melezhik