Playing games with outthentic dsl
Playing games with outthentic dsl
Outthentic - is language to parse unstructured text. It was grown as supporter for web application test tool named swat. Web application is where text often comes in unstructured and unordered way, even though there is json and xml, there are a lot of applications when it is not the case.
Then a generic test tool named outthentic was created as solution for any text parsing/testing tasks. This tool is based on outthentic dsl as well.
Creation a new consumers of outthentic language is way too easy, with API exposed and explained at outthentic documentation.
What I try to do in this short post is to highlight some randomly picked features to let readers to have a sense what is outthentic way to analyze and verify text output, which of course could be used wide in daily testing tasks.
If ( after reading this post ) you feel like to know more - an official outthentic documentation is here and ( less formal ) - here
Ranges
Sometimes you are given with some repetitive lines bounded by some conditions .
A classic thing is tables.
Imagine a table with two columns of ABC letters and a position numbers:
Letter Number
A 1
B 2
C 3
...
Z 26
End of table
Let's write up a dsl code to verify that:
- we have table 26 rows with 2 cell in each one
- a first cell of every row is ABC letter and second one is a number.
First let's verify a basic structure:
between: Letter\s+Number End\s+of\s+table
regexp: ([A-Z]+)\s+(\d+)
end:
Having this we asked outthentic dsl parser to check that we have Letters and numbers inside range bounded by table header and table footer. Quite easy so far.
Then let's count a table rows.
To do this we need to add some imperative constructions to this quite declarative code:
between: Letter\s+Number End\s+of\s+table
regexp: ([A-Z]+)\s+(\d+)
code: our $total_rows++ for @{match_lines()};
validator: [ our $total_rows == 26, 'valid rows number ']
end:
Comments here:
code:
expressions
Define perl code being executed during parsing process
match_lines()
function
Returns array of successfully matched lines
validator:
expressions
Define perl code being executed, once code is executed a return value is passed as as arguments to
Test::More::ok
function:
# $r - is array reference returned after execution of :validator code
ok($r->[0],$r->[1])
Fine control with captures
Captures function let you gain more fine control over data being checked. It returns all the chunks get captured over latest regular expression check.
between: Letter\s+Number End\s+of\s+table
regexp: ([A-Z]+)\s+(\d+)
code: \
for my $c (@{captures()}){ \
print $c->[0],'/',$c->[1], "\n"; \
}
end:
The code above will print:
A/1
B/2
C/3
...
Sequences and generators
Continuous lines sequences are often a subject of testing when dealing with unstructured text.
Let's rewrite latest code example using text block expressions:
begin:
A 1
B 2
C 3
# and so on till Z 26
end:
This simple code snippet is example of continuous sequence check, when you need to verify that one line followed by another and so on.
Quite easy, but we need to hardcode all 26 rows, which is not good. Let's rewrite this simple test again using generator expression:
begin:
generator: [ my $i, map { $i++; "$_ $i" } A .. Z ]
end:
Generators like code:
or validator:
expressions are just piece of perl code being executed.
A return value of generator code ( should be array reference ) defines new outthentic entities get parsed by outthentic parser.
There is no limit, as generator could create:
new check expressions
validator expressions
code expressions
... and generator expressions - a sophisticated example is described here
Streams
And finally new killer feature of outthentic dsl called streams.
Stream() function like match_lines() function return lines successfully matched during verification process.
But streams add some improvements against match_lines, they are able to:
accumulate data ( match_lines always relates to latest check )
group data ( see example below )
Let's see a trivial text output need to verify:
<letters>
A
B
C
</letters>
<letters>
D
E
F
G
</letters>
<letters>
H
I
</letters>
Writing a dsl code:
between <letters> <\/letters>
regexp: [A-Z]
So good so far. Let's add some debugging lines:
between <letters> <\/letters>
regexp: [A-Z]
code: for my $l (@{match_lines}) { print "$l "\n" }
Get this: ( which is obvious )
A
B
C
D
E
F
G
H
I
What we could see? We lost group context, all the letters now are seen at one heap, without knowledge about original groups.
Now with stream() function:
between <letters> <\/letters>
regexp: [A-Z]
code: \
my $i; \
for my $s (@{stream()}) { \
$i++; \
print "stream #$i\n"; \
for my $l (@{$s}){ \
print "$l\n" \
} \
}
Output:
stream # 1
A
B
C
stream # 2
D
E
F
G
stream # 3
H
I
-- Regards
Alexey Melezhik
Is your stream example code correct? It looks the same as the example above it, with "between", "regexp:", and "code:"; why is @{match_lines} an array of scalars in the first but becomes an array of arrayrefs in the second?
Thanks for pointing this! The example should be: