Playing games with outthentic dsl

Playing games with outthentic dsl

Outthentic - is language to parse unstructured text. It was grown as supporter for web application test tool named swat. Web application is where text often comes in unstructured and unordered way, even though there is json and xml, there are a lot of applications when it is not the case.

Then a generic test tool named outthentic was created as solution for any text parsing/testing tasks. This tool is based on outthentic dsl as well.

Creation a new consumers of outthentic language is way too easy, with API exposed and explained at outthentic documentation.

What I try to do in this short post is to highlight some randomly picked features to let readers to have a sense what is outthentic way to analyze and verify text output, which of course could be used wide in daily testing tasks.

If ( after reading this post ) you feel like to know more - an official outthentic documentation is here and ( less formal ) - here

Ranges

Sometimes you are given with some repetitive lines bounded by some conditions .

A classic thing is tables.

Imagine a table with two columns of ABC letters and a position numbers:

Letter  Number
A       1
B       2
C       3
...
Z       26
End of table

Let's write up a dsl code to verify that:

  • we have table 26 rows with 2 cell in each one
  • a first cell of every row is ABC letter and second one is a number.

First let's verify a basic structure:

between: Letter\s+Number End\s+of\s+table
    regexp: ([A-Z]+)\s+(\d+)
end:

Having this we asked outthentic dsl parser to check that we have Letters and numbers inside range bounded by table header and table footer. Quite easy so far.

Then let's count a table rows.

To do this we need to add some imperative constructions to this quite declarative code:

between: Letter\s+Number End\s+of\s+table
    regexp: ([A-Z]+)\s+(\d+)
    code: our $total_rows++ for @{match_lines()};
    validator: [  our $total_rows == 26, 'valid rows number ']
end:

Comments here:

  • code: expressions

Define perl code being executed during parsing process

  • match_lines() function

Returns array of successfully matched lines

  • validator: expressions

Define perl code being executed, once code is executed a return value is passed as as arguments to Test::More::ok function:

# $r - is array reference returned after execution of :validator code 

ok($r->[0],$r->[1])

Fine control with captures

Captures function let you gain more fine control over data being checked. It returns all the chunks get captured over latest regular expression check.

between: Letter\s+Number End\s+of\s+table
    regexp: ([A-Z]+)\s+(\d+)
    code:                                   \
    for my $c (@{captures()}){              \
        print $c->[0],'/',$c->[1], "\n";    \
    }
end:

The code above will print:

A/1
B/2
C/3
...

Sequences and generators

Continuous lines sequences are often a subject of testing when dealing with unstructured text.

Let's rewrite latest code example using text block expressions:

begin:
    A 1
    B 2
    C 3
    # and so on till Z 26
end:

This simple code snippet is example of continuous sequence check, when you need to verify that one line followed by another and so on.

Quite easy, but we need to hardcode all 26 rows, which is not good. Let's rewrite this simple test again using generator expression:

begin:
    generator: [ my $i, map { $i++; "$_ $i" } A .. Z ]
end:

Generators like code: or validator: expressions are just piece of perl code being executed.

A return value of generator code ( should be array reference ) defines new outthentic entities get parsed by outthentic parser.

There is no limit, as generator could create:

  • new check expressions

  • validator expressions

  • code expressions

  • ... and generator expressions - a sophisticated example is described here

Streams

And finally new killer feature of outthentic dsl called streams.

Stream() function like match_lines() function return lines successfully matched during verification process.

But streams add some improvements against match_lines, they are able to:

  • accumulate data ( match_lines always relates to latest check )

  • group data ( see example below )

Let's see a trivial text output need to verify:

<letters>
    A
    B
    C
</letters>

<letters>
    D
    E
    F
    G
</letters>

<letters>
    H
    I
</letters>

Writing a dsl code:

between <letters> <\/letters>
regexp: [A-Z]

So good so far. Let's add some debugging lines:

between <letters> <\/letters>
regexp: [A-Z]
code: for my $l (@{match_lines}) { print "$l "\n" }

Get this: ( which is obvious )

A
B
C
D
E
F
G
H
I

What we could see? We lost group context, all the letters now are seen at one heap, without knowledge about original groups.

Now with stream() function:

between <letters> <\/letters>
regexp: [A-Z]
code:                           \
my $i;                          \
for my $s (@{stream()}) {    \
    $i++;                       \
    print "stream #$i\n";       \
    for my $l (@{$s}){          \
        print "$l\n"            \
    }                           \       

}

Output:

stream # 1
A
B
C
stream # 2
D
E
F
G
stream # 3
H
I

-- Regards

Alexey Melezhik

2 Comments

Is your stream example code correct? It looks the same as the example above it, with "between", "regexp:", and "code:"; why is @{match_lines} an array of scalars in the first but becomes an array of arrayrefs in the second?

Leave a comment

About melezhik

user-pic Dev & Devops --- Then I beheld all the work of God, that a man cannot find out the work that is done under the sun: because though a man labour to seek it out, yet he shall not find it; yea further; though a wise man think to know it, yet shall he not be able to find it. (Ecclesiastes 8:17)